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METHOD OF IDENTIFYING AND DEVELOPING DRUG LEADS 



BACKGROUND OP THE INVENTION 

Field of the Invention 

This invention relates to an improvement in the art of 
using combinatorial chemistry to develop drug leads. 

Description of the Background Art 

Drug Discovery. The human genomics effort could yield 
gene sequences that code for as many as 70,000 proteins, each 
a potential drug target; microbial genomics will increase this 
number further . Unfortunately, since genomic studies identify 
genes, but not the biological activity of the corresponding 
proteins, it is likely that many of the genes will prove to 
encode proteins whose activation or inactivation has no effect 
on disease progression. (Gold, et al., J. Nature Biotech., 
15:297, 1997). There is therefore a need for a method of 
deteannining which proteins are most likely to be productive 
targets for pharmacological intervention. 

Even if one .knew in advance the perhaps 10,000 proteins 
which could be considered interesting targets, there remains 
the problem of efficiently screening hundreds of thousands of 
possible drugs for a useful activity against these 10,000 
targets . 

Historically, acquiring chemical compound libraries has 
been a barrier to the entry of smaller firms into the drug 
discovery arena. Due to the large quantity of chemical 
required for testing on whole animals and even on cells in 
culture, it was a given that whenever a compound was 
synthesized it should be done in fairly large quantity. Thus, 
there was a synthesis and purification throughput of less than 
50 compounds per chemist per year. Large companies maintained 
their immensely valuable collections as trade barriers. 
However, with the downsizing of targets to the molecular level 
and the automation of screens, the quantity of a given compound 
necessary for an assay has been reduced to very small amounts. 
These changes have opened the door for the utilization of so- 
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called combinatorial chemistry libraries in lieu of the 
traditional chemical libraries. Combinatorial chemistry 
permits the rapid and relatively inexpensive synthesis of large 
numbers of compounds in the small quantities suitable for 
5 automated assays directed at molecular targets. Numerous small 
companies and academic laboratories have successfully 
engineered combinatorial chemical libraries with a significant 
range of diversity (reviewed in Doyle, 1995, Gordon et al, 
1994a, Gordon et al, 1994b) . 

10 Co^zbinatorial Libraries. In a combinatorial library, chemical 
buxldmg blocks are randomly combined into a large number (as 
hxgh as 10E15) of different compounds, which are then 
simultaneously screened for binding (or other) activity against 
one or more targets. 

15 Libraries of thousands, even millions, of random 

olxgopeptides have been prepared by ch.r^cal synthesis 
(Houghten et al . , Nature, 354:84-6 (1991,), or gene expression 
(Marks et ai., j Mol Biol, 222:581-97(1991)), displayed on 
chromatographic supports (Lametal., Nature, 354:82-4(1991)) 
20 inside bacterial cells (Colas et al.. Nature, 380:548- 
550(1996)), on bacterial pili (Lu, Bio/Technology, 13:366- 
372(1990)), or phage (Smith, Science, 228:1315-7(1985)), and 
screened for binding to a variety of targets including 
antibodies (Valadon et ai., j moI Biol, 261:11-22(1996)), 
25 cellular proteins (Schmite et ai . , j „ol Biol, 260:664- 
677(1996)), viral proteins (Hong and Boulanger, Embo J, 
14:4714-4727(1995)), bacterial proteins (Jacobsson and 
Frykberg, Biotechniques , 18:878-885(1995)), nucleic acids 
(Cheng etai.. Gene, 171:1-8(1996),, and plastic (Siani etai., 
30 J Chem Inf Comput Sci, 34:588-593(1994,,. 

Libraries of proteins (Ladner, USP 4,664,989), peptoids 
(Simon et ai., Proc Natl Acad Sci U S A, 89:9367-71(1992,,, 
nucleic acids (Ellington and S^ostak, Nature, 246:818(1990), 
carbohydrates, and small organic molecules (Eichler et ai., Med 
Res Rev, 15:481-96 (1995) ) have also been prepared or suggested 
ror drug screening purposes. 

The first combinatorial libraries were composed of 



35 



wo 99/06839 



PCT/US98/15943 



peptides or proteins, in which all or selected amino acid 
positions were randomized. Peptides and proteins can exhibit 
high and specific binding activity, and can act as catalysts 
In consequence, they are of great importance in biological 
5 systems. Unfortunately, peptides per se have limited utility 
for use as therapeutic entities. They are costly to 
synthesize, unstable in the presence of proteases and in 
general do not transit cellular membranes, other classes of 
compounds have better properties for drug candidates. 
10 Nucleic acids have also been used in combinatorial 

libraries. Their great advantage is the ease with wh^.h = 
nucleic acid with appropriate binding activity can be 
amplified. As a result, combinatorial libraries composed of 
nucleic acids can be of low redundancy and hence, of high 
15 diversity. However, the resulting oligonucleotides are not 
suitable as drugs for several reasons. First the 
oligonucleotides have high molecular weights and cannot be 
synthesized conveniently in large quantities. Second, because 
oligonucleotides are polyanions, they do not cross cell 
20 membranes. Finally, deoxy- and ribo-nucleotides are 

hydrolytically digested by nucleases that occur in all living 
systems and are therefore usually decomposed before reaching 
the target. 

There has therefore been much interest in combinatorial 
25 libraries based on small molecules, which are more suited to 
pharmaceutical use, especially those which, like 
benzodiazepines, belong to a chemical class which has already 
yielded useful pharmacological agents. The techniques of 
combinatorial chemistry have been recognized as the most 
efficient means for finding small molecules that act on these 
targets [3). At present, small molecule combinatorial 
chemistry involves the synthesis of either pooled or discrete 
molecules that present varying arrays of functionality on a 
common scaffold (4) . These compounds are grouped in libraries 
that are then screened against the target of interest either 
for binding or for inhibition of biological activity. 
Libraries containing hundreds of thousands of compounds are now 
being routinely synthesized; however, screening these large 
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Ubrar.es for binding or inhibition with all 10,000 potential 
targets cannot be reasonably accomplished „ith present 
screening technologies, and there are numerous experimental and 
computational strategies under development to reduce the number 
5 of compounds that must be screened for each target (5-e) 

Information-intensive drug discovery. As pointed out by 
Paterson, et al.. a. „ed. chem., 3S: 3049-5S UBBS) . medicinal 
chemistry advances through the dual processes of "lead 
discovery, and "lead optimization-, m "lead discovery", the 
,0 "''^ective is the discovery of an "activity islaTd". a 

chemica class with a high fre^ency of active molecules, 
(this class may be defined mathematically as a volume within 
a multidimensional space defined by various molecular 
descriptors). In "lead optimization", the "activity island" 
5 IS e:^lored in detail. « each compound synthesized and tested 
can be considered as a probe nf ^ »r^^4 ui^ ^ 

Of a "neighborhood" of similar 
compounds, in "lead discoverv" ir- -lo • • 

' IS inefficient to test 

compounds whose neighborhoods overlap 

^° recent advancements in genomics and 

• molecular biology has bp^n = • 

technnlo^ u-u revolution in information 

technology, which includes relational u 
ny^r.hir, ^ relational databases, computer 

graphics, and neural networks (1.3) tu^.,^ 

^ . ^ These capabilities permit 

the construction of databases of descriptors that describe 
either compounds or targets in quantitative terms, and these 
descriptors can be related to ma.e predictions about the 

ta^etHh """^ ^i=l=9ical activities, and the 

targets they act on (5-8) . 

structure descriptors can be based on a variety of 
structural features. These approaches provide arrays of 
molecular descriptors that can be used to assess the similarity 
of molecules in a library. 

Paterson et al. {1996) ranked 11 molecular diversity 
descriptors according to their • ^ ^. 

^ J . ^ '^neir utility m defining a 

neighborhood region. In ordp-r of •ir,^^ • 

^ °^ increasing usefulness, these 

were random numbers =log p= mr -czt-r-^ir. ^ 

ir,H^^^„ ^. °9 ^ MR -strain energy < connectivity 

xndices < 2D fingerprints (whole molecule) = atom pairs = 
autocorrelation indices < steric CoMFA = 2D fingerprints (side 
Cham only) = „.bonding CoMPA fields. The authors note that 
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any group of xnd.vidual div.rsiey descriptor, can b. conObined 
into one ccmposxte descriptor by analogy to Euclidean distance 
Where composite distance is the s^ara root of the weighted su" 
of the squares of the individual descriptors. They suggest 
5 autcscaling the vector by dividing each individual dLcrfptor 
by Its observed standard deviation. "iptor 
Klebe and Abraham, J. „ed. chem.. 36. 70-80 (1993I 
explored the ability of comparative molecular field analysi 
to predict new biologically active compounds on the basis of 

postulating a drug-protein ^n«r,„,^„. ... . . 

steric and electrostatic I—-; b^Thr t^w! « 

^zr-L^ -ru^ t - — - autho:: 

' " enthalpies or bindina 

15 constants, could be predicted h., ^k- u . omaing 

v.- J. preaicted by this method for molecules 

binding to human rhinovirus 14 hh^r-„,^i • nioiecules 
. "Virus 14, thermolysm, and renin. 

To facilitate the selection of structures from large 
compound sets for combinatorial chemical synthesis and high 
throughput automated bioassays, Cummins, et al j chem Inf 
20 Comput. Sci. 36: 750-63 (1996) fi^<=^ / 

^ first calculated 109 different 
structural descriptors for ear-h airrerent 
fhh«=o. . , °^ 300,000 compounds 

(these were taken from two databases of commercially available 

oTLtrlnce^™ ^ — --e 

the list Hnu^ ^ ° normality of distribution, they winnowed 
tne xist down to a final Q*ah ^^-f j 

prooertv fh. <. dascriptors {one physical 

property, the free energy of solvation, and SO topological 

,0 olaLd . "^""^^'^""^ i-'" "hich each compound could be 

Of the'dalr r?''' "^^ " '"^ dimensionality 

of the data; only four factors were needed to explain 90% of 

5^ Tl "^'^ tolxplair vel 

r r ' "'^ alues were greater than unity. 

5 Of ^'^^^"""^ compounds from two databases 

Hedic3T ■"'"'^'"^^ -^o-^-hensive 

intoT «°°«' "•«C<:S-II Drug Data Report") 

into the same descriptor space. They suggested that com^oundl 

in the commercially available compound database which, in 
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descriptor space, overlap with the descriptor space of the 
biologically active compounds, are of interest for drug 
development . 

Matter, J. Med. Chem., 40:1219-29 (1997) was interested 
5 in selecting an ensemble of nonredundant compounds for mass 
screening. The author asked, "Which physicochemical measure 
of similarity correlates with biological properties?" Matter 
evaluated a variety of molecular descriptors, including 2d 
fingerprints, atom-pair fingerprints, topological 2D 
10 descriptors, autocorrelation functions for atomic properties, 
flexible 3D fingerprints, molecular shape descriptors, and WHIM 
(weighted holistic invariant molecular) indices. Cluster 
analysis was performed on the 1283 biologically active 
compounds, in 55 bioactivity classes, from the IndexChemicus93 
15 database. The ability of each descriptor to predict the 
biological activity of one compound based on its similarity 
(measured by that descriff or) to another compound was evaluated 
by a chi-squared statistical test. The 2D fingerprint 
descriptors were found to be the most useful in making 
2 0 predictions. 

Such information is useful in combinatorial chemistry, 
because in optimizing leads, testing similar compounds is 
productive, while in discovering leads, testing similar 
compounds is wasteful (7). Therefore, a quantitative 

5 description of the similarities of compounds based on their 
structures (and by necessity a quantitative understanding of 
what ^similar" means) can be used to direct efficient drug 
discovery. indeed, quantitative structure-activity 

relationships show that many of these descriptors can in fact 

0 be correlated with biological activity of the compounds in the 
library. 

An exciting recent application of this approach has been 
described by the National Cancer Institute for the molecular 
pharmacology of cancer (23) . m this approach, there are three 
5 databases that are related. 

The "activity" database (A) contains the activities 
against 60 cell lines for 60,000 compounds that have been 
screened at NCI. The similarity in the activity profile 
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agaxnst the panel of cell lines can then be calculated for any 
two compounds, and is generally assessed by a pairwise 
correlation coefficient (PCC) , which is determined by an 
algorithm called COMPARE, which calculates the similarity of 
all of the compounds in the database to a user-supplied "seed" 
compound. 

The "target" database (T) has been created for 100 
proteins (targets) whose level of expression was determined in 
the same «0 cell lines. These expression levels were assessed 

I't t 1 '^^""'^^^ '"^^ -^ther the 

quantity of exn-rf^Bco^ , 

^^w.cxii <e.g., jby Western blots or 

immunocytochemistry) or the cruan^^^,, ^^-p ™ 

J'/ L.xie quantity of messenger RNA (e a 

by ^antitative PCH or Northern blots, for each protein in each 

cell l.ne. Relation of the A and T databases then provides 

.nformataon on the molecular pharmacology of the compounds in 

A; iniiibitxon of one of the he^av-n., «^ 

„ . Heavily expressed proteins emerges 

as a possible mechanism for the activity of the confound. 

Finally, a "structure" database (s) has been compiled that 
contains structural descriptors for a library of 460.000 
compounds that includes the compounds in A. similariWes 
between the structural descriptors can be calculated for all 
of the compounds in s, so for a given active compound in A 
unscreened, but structurally similar. compounds can b; 

likelihood Of being active in the cell lines for which the 
screened compounds are active. The latter process therefore 

a civer b T a^ter a compound with 

a given biological activity has been identified. The NCI 
approach in defining the target database ,T, is significantly 

biological activity assays. 

calcur« TT'"' """""^^ descriptors cannot be directly 
calculated from the amino acid se^ence. because the three^ 
dimensional structure is not taown and the residues that 
co^rise the binding site are not taown. „e describe here a 
database for protein targets that will have the same predictive 
va ue as chemical library databases in predicting similarities 
between proteins, however, the quantitative descriptors will 
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be determined by in vitro experiments rather than from 
calculation. 

While there are numerous computational approaches for in 
. vitro typing of small molecules based on their chemical 
5 structures (5-8), there are no analogous experimental or 
theoretical methods for obtaining in vitro information on 
protein targets that can be used to relate similar proteins, 
outside of actually screening small molecules (S) . 

Kauvar. et al.. Chemistry & Biology, 2: 107-118 (1995) 
10 "fingerprinted" over 5,000 compounds by the binding potency 
(concentration needed to inhibit 50% of the protein's activity) 
of each compound to each member of a reference panel of eight 
proteins. (These proteins were selected on the basis of 
readxly assayable activity, broad cross-reactivity with small 
15 organic molecules, and low correlation between each other in 
binding patterns.) a screening library of 54 compounds was 
then selected K.sed on the diversity in their "fingerprints" 
(inhibitory activity against the reference panel proteins) 

This "training set" was used to evaluate the similarity 
of the ligand binding characteristics of a new protein to one 
of the reference panel proteins. By regression analysis, a 
computational surrogate (a weighted sum of two or more 
reference panel proteins) for the new protein is determined. 
The activity of all fingerprinted compounds to inhibit the 
activity of the new protein is predicted as the sum of their 
appropriately weighted inhibitory activities against the 
component reference proteins of the computational surrogate 
Predictions may be improved by testing additional sets of 
compounds against the new protein. See also L. M. Kauvar H 
30 0. Villar. Method to identify binding partners. US Patent 
5587293. 

In one embodiment of the present method, proteins are 
fingerprinted on the basis of their chemical reactivity, in the 
presence or absence of a binding partner, rather than on the 
basis of their biological activity. Therefore, we do not need 
to Identify a large number of diverse affinity molecules for 
each reference protein. 

In another embodiment, proteins are fingerprinted on the 



20 



25 



35 



wo 99/06839 



PCT/US98/15943 



ba3.s Of the.r affinity for peptides or nucleic acids in a 
h.gh-d.versity library. This library provides a far greater 
range of conformational variation than is provided by Kauvar-s 

training set. 

5 Biornolecule reactivity. chemical reactions that modify 

nucleotxdes have been extremely successful in probing thi 
structures of complex DNA and RNA molecules (..,.7) . m these 
studxes. a reagent that oxidizes nucleic acids by a particular 
reaction pathway is used to create k= / Particular 

in 1 ■ ==t:a 1:0 create backbone lesions in the 

10 polyanion. The sites of modification ^ 

■ . , T . "uiiication are then determined by 

Zrl T ^^-^-P^o-sis. These sites then indicate 

where the reactive functionality on the nucleotide (e g 
guanine N7 or deoxyribose C4 M i <, »^ ^ 

DesDite ^h» C4 ) is exposed to the solution. 

respite the success of these c!h„/n^„ ■ ^ ^. 
•,T , . i-nese studies m defining comnlex 

IS nucle.c acid structures, these concepts have not been u^r to 

defxne protern structures, primarily because the a.ide bacl^one 

bacr " Phosp^odiester 

backbone .a, . xn other words, most reactions that dan-aje 

.0 ^'^^^^ Pbosphodlester backbone, but reactiot 

aide b ^^""^""^ the 

amrde bacteone. Thus, high-resolution gel electrophoresis 
cannot he used to map the sites where amino acids are reactive 
toward a grven reagent and therefore exposed to solution. 
25 m» " y-^"^"'"' ^"'"Po^nd, PtPop, has been used to footprint 
25 D«A. Bremer, K. M. , „. a. Daugherty, T. s. oas and H H 
Thorp ,1..S) . ..*n anionic Diplatinum B«A Photocleavage .gent 
O^emrcal Mechanism and .ootprinting of Lambda Hepressor w.' 
Am. Chftm .Qr.^ 117. 11673-11679. 

30 M»^H-'"^°' °' Hydrophobicity and 

*r"r'%f^'^"-'=^-i^"ve systems.-. Proc Nat. LT 

h d Tl that the surface 

hydrophobicity of rat liver protein, surraoe 
anrt fh=^ ■ • proteins increases with animal age. 

and that in vitro exposure of such proteins to a metL 
catalysed oxidation system ,ascorbate/Pe < II) /hydrogen peroxide, 
35 or to a peroxyl radical-generating system (MPH:2,l-Lbisl2 
amidinopropane) dihydrochloridel leads to an increase L 
surface hydrophobicity, protein carbonyl contl" a^d 
conversion of methionine to methionyl sulfoxide The 
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system also resulted in ov-i ^ 

precinitation oxidatxon of tryptophan residues, 

by Tans Of the h " '"^'^^^ ^y^-phobicity were detected 
£>y means of the change in protein fluorescence (490 nm) 
5 associated with binding of ANSA fs.nST ! 

oxidation, although their susoeptibilitre;' 

Methionine residues th-„ . "Ptlbllities vary greatly. 

(With susceptibmtv gen. ' ^"^^^^^-""^ susceptible 

IB Of surface e^^o ^ Z7 ' "i^^ their degree 

™ethioninesuTfo:r;e:: t seTfrrr^strt: ^""'^"""^ 
to .etMonine residues is wides^ad teXJ er'r '^""^'^ 
that the surface methionine residues act -"99-ted 
.defense, scavenging oxidants hef orl hey ca^ at^t T^""^^" 
20 critical to structure or function 'evine " ir:'"^^ 
glutamine synthetase by a metal cat.l T °xidized 
that a Significant nuler of mlthi '"^ 
oxidized without an i "'ethionine residues could be 

-Vine et :i rlggl ™rn -ceptibility. 

=s nu. Of =urface^::::r <ru:r:airii™- 
™t:":f r:r~L:r ara dr ■-■^ '^^^ 

protein, or that the chemilal re ctivitv 

complex may be used as a d.=. °* = li9and.proteln 

30 protein binding site ^"^-^^^"-^ 'hat ligand and its 

A!J references, including anv . 
appiJcatlons, cited in ,.7 ^ '^'^^"'^ 

incorporated by re^r^ce 1 T^^^^""" ''-^y 

reference coJitutrplior Lt ^r^'^ ""'^ 

35 references states .iat theTr ant. of the 

reserve the riaht to I^aJ f applicants 

tie Cited dol^nt ^"^"^^^^ ^^""--^ 
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SUMMARY OP THE INVENTION 

in one embodiment of the present invention, a prospective 
^ery. protern, usually of untao™ structure, is characterized 
by a reactivity descriptor", by which it is related to 

r'""^'" ^"'"^'"^ - "reference.., 

or "database, proteins) for which both a "reactivit; 
descriptor., and one or more drug leads are .no«,. i 
combrnator.al chemical library enriched in or even limited to 
chemrcal compounds similar to the drug leads previously 
10 identified for the rela^Arl r.^^*. ■ f-^cviuusxy 
svr,^h„- ^ / related proteins in the database is then 

synthesized and screened for bindinr. .... . . 

^v,^ „ — dv^uxvicy against 

Of ch^ , ■ '"^^^^^ 

proteirr "^^^ ^^^^^-^^ ^^--^ - "-^-wn 

protean target, and/or increases the likelihood of ..hits" 

The reactivity descriptors here contemplated relate to the 

reactxvrty of the target protein (the term -target protein' 

refers to both -^ery.. and ..reference" pvoteins both ber'g 

30 rea'ls p '"^^ ^ ="-^-1 

reagents. For a gxven reagent, the difference in the two 

Is'lcllud'r T"'"''"^ °' '""^ °^ '''^ P-"- ""ic'^ 

bL " ° ' '"^'"""^ ^= ^ "-1^ °f lig-nd 

acLIl'l r:":"' °' mdude the 

actual ligand binding site nr,^ 

r-o.^^- • ® ""^y ^Iso compare the 

25 reactivity in the bound state with l-h^^ 

^„ ^ , , °^ target protein 

m the unfolded state, and thah r.^^ t-v, » ^ ^^^±n 

foiHo^ . e, ana that of the protein in a free but 

folded state to that whlrh ii- 

Th*.«o unfolded state. 

These comparisons provide fnr-h • ^ 

^"rther information about the 

structure of the protein. 

"^^^ ligands used to define v j- 

a cu to aetme the binding site of the protein 

-y be the natural ligands therefor, if teo„„, ,r they may be 

™e"t * ^ -1-"'^ 

last! Tt rt""' °^ ''^-'-S it 

35 natural li T " 

T^T ■ ^ '""""^"^ ^ attained by 

su^ro«t\"l' . '"T"' combinatorial libraries, since thi 
surrogate Irgand wrll be used only for gathering structural 
.nformatron, and not as a drug per se, the library may b^ ' 
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chosen from the point of view of obtaining the most structural 
d.vers.ty for the least synthetic effort, ignoring the 
suitaMlxty Of the library .embers as drugs. For this reason, 
a preferred surrogate ligand library is a peptide ("BioKey-) 
5 or olxgonucleotide library. 

in another embodiment of the invention, a guery protein 
.s related to reference proteins by its ability to bi7d similar 
surrogate l.gands. A combinatorial oligomeric library of 
surrogate ligands, typically peptides or nucleic acid^ is 
10 screened, and the oligomers which bind the target protein J 

thus are called -aptamers") ,v- -v, . 

„ - ' ^-^^aj-awuex xzea CO yield 

aptamer. or descriptors. The aptamer descriptors (sequences 
and, possibly, additional information such as contact^oints 
and secondary structure, identified for the guery protein are 
15 compared to those identified for rh. j.^ u 

rt„,„ o "'^^"^ <*«abase proteins, and the 

drug leads previously identified for the database proteins 
characterized by the moor f'ui-eins 
favored. si""ilar surrogate ligands are 

20 and th" l'""''""^'' -'^^---t. the aptamer are nucleic acids, 
and the bases involved in protein binding are determined b; 

footprintmg techniques . By takinrr -i r,*-^ . 

y ay caking into account the predicted 

secondary structure of the nucleic ^ai^. ^v, • 

, . ^ iiucieic acids, the epitope of the 

nucleic acid may be characterized = f i^ne 

mraccerized as a sequence whose elements 
are unpaired c. A, T. or c, or any of the sixteen possible 
25 pairings (matched or mismatched) of those four bases This 
sequence, for each surrogate ligand binding a reference 

binding the query protein. 

30 the oLt'- "T"'"" ^"'""""^ '"^^ characterisation of 

the proteins by reactivity descriptors be combined with its 

btd™"""" - -.and 

If desired, the work involved may be reduced by first 

35 "r"^' ^""^^"'"^ "^-^ - o^-il 

thfb H P^"-^"- "-en using 

the bound molecules (aptamers. from that library to modulate 
the Chemical reactivity of the protein and thereby help 
Characterize its binding site,s, by means of chemical 
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reactivity descriptors. However, our preference is to use 
peptides to altar reactivity and nucleic acids to generate 
aptamer descriptors. generate 

5 de. °' and/or aptamer-based 

5 descriptors, the similarity of the query protein to each of the 
reference proteins is determined. Por each reference protein 
one or more drug leads are .no», so these drug leads may ^'e 
rated or ranked, as drug leads for modulators of the Jery 
10 IZ °" Similarity of their reference protein Z 

Char their o™ drug 

characteristics (e.g., potencv. h,lf.,<«. _„ . ' 

- - ' --..j-^, oxue exreccs; . 

anriched'f """"^ " synthesized which is 

-enriched for members which are structurally similar to the 
aforementioned drug leads, structurally similar members maj 
15 be Identified in a formal manner by use of the chemical 
structure descriptors available in the art, or more informally 
through a chemisfs expert :udgm.nt of structural similarit" 

resorll T ^'^^ ^-^^ "^"--t 

20 ° li'^-n', i.e., by synthesizing 

lead analogues on an individual, noncomblnatorial basis 

The lead analogues, whether in the form of a combinatorial 

tL ;:'i°: "^"^'^ screened for 

the ability to modulate the target protein-s activity, in vitro 

or successful analogues are added to the Ll^, 

leads associated with the target protein, which now becomes a 
reference protein. e^omes a 



30 



35 



a database .""t ^"""^^ ™ through .querying 

a database of reference proteins and their associate! 

tliZXZT -"-^--^ial library is screened 

prin^rily for purpose of optimization of these leads, although 

ea^ st IVL'" ^"^erent in structure from the 

l«d so that there will be some secondary lead discovery as 

^"^ ''^'"'-^ '^^i-s are hereby incorporated by reference 
as a description o^ the preferred embodi„,ents. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



ItT " '"^ reactivity of th. transition- 

«acta "-l-o"das, Ru,t>p,,bpy,0'- ,RuO, 

react, by hydrogen abstraction at a- and oxygen transfer a 
5 guanine C8. Pt,(poD) rearhc k„ u ^ '^"''ter ac 

.r,r, u 2^P°P^ reacts by hydrogen abstraction at 4' 

and 5 by outer-sphere electron transfer at guanine. Ru (bpy) 3- 
reacts only by outer-sphere electron transfer. The reactfv y 
Of the complexes towards amino acids should therefore Z 
different for all three reagents as well. 

10 Figure 2. Amino acids likelv t-o 

w ^ . -Liiceiy to be reactive by hvdroaen 

abstraction. Probable sites of c-H ac^^v.^• '^^'^^^en 

boxes. ^ 01 c H activation are shown in 

Figure 3 Scheme showing the solvent accessibility of reactive 
ammo acid residues (*) as a fnr,^- • ^ ^ reactive 
" Of the BioKey peptide. " °' '"^ ^^"^-^ 

«9^r, 4. scheme showing the assay used to determine the 
relative rates for modification of a protein by a reagent The 
presence of the protein decreases the yield of Form II DHA in 
a manner related to the rate constant for oxidation oTt^e 
20 protein by the reagent. 

Figure S. £ia^ ^^^^^^ ^ of an RHA 

I^^is^a two-a.mens.onal grid representation of the same 
25 Pigure 6. Flow Chart of Preferred Method. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OF THE 
INVENTION »ii«x!> Of THE 

The present invention is directed to a method for the more 
iTril identification of small organic molecules 

da tol : ^'^'^^ '^^'"^ ^ -^-^^ °^ than 50 

ootent TT ^^^—^--lly acceptable and which are 

potent modulators of the biological activity of a protein. 

Receptor-MPdiafPri Pharm^^o^i^^.-^-.. ^ 

Many pharmacologically snrUr^ , . . 

. - — '-»".*^ouciiiuet> exicit a 

Lr^r "^'^"""^^ interacting with a specialized 

portion, taown as a rgaB^,-,, the target cell. The 
substances which are able to elirii- fv,., 

■ ^ to elicit that response, by specific 

interaction with a receptor sic. =™ u specitic 
T™f„i, . =ptor site, are known as aagnists. 

Typically, increasing the concentration of the agonirT7;;;e 
receptor site leads to an increasingly larger re^se, L u 
a maximum response is achieved. A substance able to elicit the 
maximum response is known as a agonist, and one which 

elicits only, at most, a lesser (but discernible, response is 
a partial agonxst. 

A pha^co1oa1c.1 n nn ,np1.| i. a compound which interacts 
with he receptor but without eliciting a response. By doing 
so. It inhibits the receptor from responding to agonists I 
.^^m.^ antagonist is one whose effect can be LercIIe by 
increasing the agonist concentration,- a ..n^aiHE^ 
c~:::tVn^ ^affecte-r^;^ 

there^f^ substances which bind to receptors, and 

anta^Lt"""^'" Pharmacological 
antagonists. Ligands which activate (agonize, or inhibit 
-antagonize, the receptor are here termed oi^flulM^. 

.H. T "^^^f^^" °' <^3 antagonism is broader than 

the Pharmacological concept, including phenomena that do noT 

Z I^l T agonist.receptor binding. A 

Physiological, antagonist could be a substance which directly 
or indirectly inhibits the production or release of the natural 
agonist, or directly or indirectly facilitates its elfmL 



20 



30 



35 
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fro» the receptor site. a physiological antagonist of one 
receptor .ay be a pharmacological ^J l^ 

such as one which activates an „ which degrades tie 
natural ligand of the first receptor. 



If a 



disease state is the r»=i,if 
activation of a receptor the H inappropriate 
tr»»f.H K '^S'^^PCor, the disease may be prevented or 

created by means of a Rhvsiolooi^=i 

antagonist. Other disease stated Pharmacological 
activation of a receptor L ^h T ^-^equate 
10 prevented by means ofT:;!:™^^ "^"^^ 
An important class of 

, , . ~ f-^j-o axe proteins embedded in 

the phospholipid bilayer of c^n m^^u «="aea m 

ctyer or cell membranes. The bindina of an 
agonist to the receptor ( typical Iv .^ = ^^naing of an 

^ ■^ ■^•'•y ^" extracellular bindinci 

site) can cause an allosterir- r.H=„ omaing 

change at an intracellular sit*» 

TT"- T "^'^'"""^ interaction with other biollecules 
The physiological response is initiated by the interaction 1th 
this "second messenger" (the aonn-i =^ • -^on wicn 

or ..effector- molecule. ' '""^ messenge. ., 

The peptides and nucleic a^-i^., ^ • 
20 invention can act as agonii" (bTn.t "! T ^'"'"^ 
causing its activation! T ' '"^ «=eptor and 

activation), as antagonists (binding to the 
receptor without activating it a„rihi„„u- 

ac,n„i=f., ^ blocking its activation by 

agonists), or as pharmacologically neutral species (binding Z 
the receptor without either activating or bloclcing it, 

Enzymes are special types of receptors. Receptors 
interact with agonists to form complexes which elicit a 
biological response. Ordinary receptors then relea^ the 
agonist intact. with enzymes ^h. "release the 

substrates, and the enzymes^iv ' T""" 
30 Of the substrate. ^I^Zs TrTj ' -■^"«"°n 

me^rane proteins, they Ty be secTeted"""" " 
proteins nf^o„ secreted, or intracellular, 

l:::::. r^rger ;;":r "^d ^ 

product Of an .upstream- eLymari; Tction 

35 Thus, drugs may also be us-ful - . . 

i^t.^ ^- . uswiui because ot their 

interaction with enzymes Th<» ri,M.„ cneir 

for the enrvm. ^ ^^^^ ^ substrate 

ror tne enzyme, as a coenzyme or- 

.—sible inhibitor isTn ' .rctiratr-T t^l^ a^^ 
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Into an ^ " indirectly, the conversion of a proenzyme 

into an enzymes. Many disease states are associated wITh 
inappropriately low or high activity of particular enzy.es 

The present invention may be used to identify both 
S agonists and antagonists Of receptors. It is not unusual for 
a relative y small structural change to convert an agonist into 
a Pharmacological antagonist, or ^i.^^. 

If the drugs .nown to interact with a reference protein are all 

10 1°::^"' ""^'^ '^'^''^^ - ^eads to tL 

10 identification of both agonists and antagonists of the 

reference protein and of r-^^=,^^^ ...... _. .. 

" f-'-'-'^^j-as. Similarly, known 

antagonists may serve as dmrr i Known 

leads, not only to additional 
antagonists, but to agonists as well =^^t^°nal 

15 dev.l^'^' ""^'""""^ ""'^ P-Ptides used in 

bind tT7 -^-^e^ only for their ability to 

bind the target protein; it is not required that they activate 

tes":t::;r '"""'r - 

Their tui: -n occur, others will not. 

.0 sulface Tot ,o'' " characterize the target protein 

surface not to serve as drug leads themselves (although the 
practitioner is free to te^^ ■> • -^i-^ougn tne 

i^ree to test the nucleic acids and peptides for 
agonist/antagonist activitv ;,t,^ ^ t^^t^^^ass ror 

in ^ho H ■ . activity and to use the active ones as leads 
m he design of active analogues which are more suitable than 
nucleic acids and peptides ^gr^ as drugs) . 

^ Protein Rinding ;.nH P^^i^ ^ j^^T 

Many of the biological activities of the proteins are 
attributable to their ability to bind specifically to one ol 

Zll ' "'""^^^ -^'^^ ^^e-elves be 

proteins, or other biomolecules. 

' When the binding partner of r.>-«4- • • , 

1 ^. , foitner o£ a protein is known, it i«3 

btdtr ' - -"^y the interactio of tL 

ac iZ'^;"^^ ^^^^^"^ ^"ects biological 

activity. Moreover, one may screen compounds for the ability 
of the compound to competitively inh^'b^i- ^ 
comnlev o>. ^ ^ • mnibit txhe formation of the 

complex, or to dissociate an air-a=.^„ * 

inh,-K^^ already formed complex. Such 

inhibitors are likely to affect the biological activity of the 
protein, at least if they can be delivered i^ ^ L°'3ite 
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of the interaction. 

na«„" t"""^ " -"^ <^^- binding 

partner an effector of the biological activity, then the 
xnhxhxtor will antagonize the biological activil;. if the 
B b.nd.ng partner is one which, through binding blocks a 

wlu "rifftr?' ^"^^"^ - that'interactio 

The residues whose functional 
in T- J iunccional groups participate in the 

Ue : ' interactions together for. the ligLd biLnJ 
groups of the Ixgand which particinatP in 

toaether- ^i, • ^^^icapate in these interactions 

together form the epitope of the ligand 

in the case of a protein, the binding sites are typically 
surface patches. The Mnding characteristics 
of the protein .ay often be altered by local modifications at 
these sites, without denaturing the protein. 

While it is possible for a chemical reaction to occur 
between a functional group on a Dro^»■in ^ 

„„„„ , . aggregate effects of several 

noncovalent interactions. Electrostatic interactions Include 
salt brrdges, hydrogen bonds, and van der Waals forces. 
25 ah= ' '•y='^°Phobic interaction is actually the 

rittTthi:ro:r::rertr b~ ™ 
stbnt- th ™— :r rrr- r: 

stabilizing the conformation of a protein ^nr, 

affect lirrar,/! u- J- P^otein and thus indirectly 

Sites as ir T '"""^ '° "'"^ "-"i- « 'he sa^e 

protei" «ith other 

sltl'se" "I""'" biologically significant 

&ujoscances e.g. nucleic acid<3 ^^r^-;^« 

c acids, lipxds and enzyme sxibstrates. 

35 Potency 

The potency of an antaaonist nf a r^^^^ - 

agonxst of a protexn may be expressed 



wo 99/06839 



PCT/US98/15943 



19 



! so t:he antagonist which causes 

a 50. xnh.b.txon of a protein's binding or biological activity 
effect ^""^ or i,^ assay syste.. a phar^aceuticall^ 
effective dosage of an antagonist depends on both the IC50 of 
5 the antagonist, and the effective concentrations of the protein 
and xts cynically significant binding partner(s) . 

Potencies may be categorized as follows: 

Category ICSO 

Very Weak >i ^ ^^^^^ 

^° "^^^^ 100 n moles to 1 f, mole 

Moderate m „ 

' •"■^^•^ <-<-> J.UU n moles 

1 P ^ole to 10 n moles 
Very strong <i p ^^^^ 

Preferably, the antagonists identified by the present 
invention are in one of the four- hirru y ^ present 

above ;,nH = • ^^"^ categories identified 

above, and are m any event more ootenr ^h=r, 
knr,..m fr,-- . • Potent than any antagonist 

on a receptor. "^""I'-nS 50% of its maximal affect 
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Drue IiPaHc 

The term "drua lead" no^^ u 

which ic , K r ' ^^^^^^ ^ compound 

wnicn IS a member of a strue^1lr-a1 ^i ■ . 

25 suitable structural class which is generally 

suitable, m terms of Dhvsiral ^v,, ^ • . 

pnysical characteristics (e a 

s"^e effe^f " 

serve effectively as a starting point for the design of 

30 -useful as arugs. TheMrug 

lead may be a useful drug i„ its own right, or it may be ! 
compound which is deficient as a drug because of inadl^te 
potency or undesirable side effects m , ^n^a^quate 

1 «ij:eccs. In the latter case 

analogues and derivatives are so„nh^ u- v 

deficiencies. m the foLrcase one . """^"""^ 
>5 already useful drug. " '"■^'"''^ 

rationard"'T" ^-"-tives may be identified by 

rational dn.g design, or by screening of combinatorial or 
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noncomblnatorial libraries of analogues and derivatives 

wexght Of less than x.ooo. ™ore preferably, less than ,50, 
stUl .ore preferably, less than 600, .ost preferably, less 
S than 500. Preferably, it has a counted log octanol-water 
partition coefficient in the range of -4 to .14 ™ore 
preferably. -2 to *7.5. 

Target Pro^^^no 

10 a natura^r"^'' °^ 

10 a naturally occurrincT protein = „,.u.._... 

_ . , oi^uiixL or aomam thereof, 

r, 7 "T"' ^ - ".icroorganis. 

Uncludr-ng bacterial. fungi. algae, and protozoa, an 
invertebrate (including insects ;,t,h . ^ ^ * an 

uj-ng insects and worms) , or the normal or 
cancerous cells of a vertebra^*. f^c^ • , 
15 fish ;,nH vertebrate (especially a mammal, bird or 

fash and, among mammals, particularly humans, apes, monkeys 
cows, pigs, goats, llamas, sheen r-^r. ■ o^Keys, 

lets, sneep, rats mice, rabbits, guinea 

be r T\ T • ^'-e target protein Z 

20 cLt ' °' i-obilization of the target 

a ~ " inhibitor of 

Ides . ^ '° selectively inhibit an 

undes.red aot.v.ty of the mutant protein and leave other 
activities substantially intact) . 

25 nho=„r a1i,, a glyco-, lipo-, 

25 Phospho-. or.etalloprotein. It may be a nuclear, cytoplasmic 
«»trane. or secreted protein. It may. but need not be I; 
™. The known binding partners ,if any, of the target 
protein .ay be. inp.r ^ i' . , other proteins, ollgo! " 

30 TZ-:T'- ""^'^'^ hydrates, lipids, or smaU 

organic or inorganic molecules or ions. The biological 

u^n:: " — - - - 

kinase 

protein kinase 

tyrosine kinase 

Threonine kinase 

Serine Kinase 
nucleotide kinase 
polynucleotide kinase 
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Phosphatase 

Protein phosphatase 
nucleotide phosphatase 
acid phosphatase 
alkaline phosphatase 
pyrophosphatase 

deaminase 



protease 

endoprotease 
10 exoprotease 

metalloprotease 
serine endopeptidase 
cysteine endopeptidase 

nuclease 

^ 5 Deoxyr ibonuc lease 

ribonuclease 
endonulcease 
exonuclease 

polymerase 

SS? Dependent RNA polymerase 
DNA Dependent DNA polymerase 
telomerase 
primase 

Helicase 
25 Dehydrogenase 

transferase 

peptidyl transferase 
transaminase 
glycosyltransf erase 
3 0 iribosyltransf erase 

acetyltransf erase 

Hydrolase 

urease 

carboxylase 

35 

isomerase 

dismutase 
rotase 

topoisomerase 

40 glycosidase 

endoglycosidase 
exoglycosidase 



deaminase 
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lipase 

esterase 

sulfatase 
cellulase 
5 lyase 

reductase 
synthetase 
Ion Channel 
DNA Binding 
10 RNA Binding 

Ligase 

RNA ligase 
DNA ligase 

Adaptor or scaffolding protein 

15 Structural protein 

fibrin (ogen) 
collagen 
elastin 
talin 

2 0 Tumor Suppressor 
adhesion molecule 
oxygenase 

oxidase 

peroxidase 

25 chaperonin 

Transporter 

electron transporter 
protein transporter 
peptide transporter 
3 0 hormone transporter 

serotonin 

DOPA 

nucleic acid transporter 
eignal transduction 
35 neurotransmitter 

structural component 
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20 



25 



of viruses 
of cells 
of organs 
of organisms 

5 iafonaation carrier/storage 

antigen recognition protein 

MHC I complex 
MHC II complex 

receptor 

10 TNfa Receptor 

TNF(3 Receptor 
^-Adrenergic Receptor 
a-Adrenergic Receptor 
IL-8 Receptor 
15 IL-3 Receptor 

CSF Receptor 
Erythropoeitin Receptor 
FAS Ligand Receptor 
T-cell Receptors 
B-Cell Antigen Receptor 
F episilon Receptor 
Growth Hormone Receptor 
Nuclear Receptors 
Glucocorticoid 
Estrogen 
Testosterone 

The binding protein may have morp ^h^r. ^ 
*-u^„ ^ , ^^^^ one paratope and 

they may be the same or different n^f^^^ . F 

^-^-Li.ex-enc. Different paratopes mav 
interact with epitopes of different hs.^- 
30 in^^,,,-^ 1 airrerent binding partners. An 

TrZ ^"^^'^'^ ^ particular binding 

par 7- '""^^^^ ^^"--^ binding 

partners. A protexn can bind a particular binding partne! 
through several different binding sites Tho k- ^- ^^''^''^^ 
K« ^. ^ Sices. The binding sites may 

be continuous or discontinuous (vis-a-vi. ■ 
35 of the protein) . ^""^ ^'^^'"^^ sequence 

Prote^oi '"V"^'' °' ^^'^^^"^^^ P-^^-' any 

::r;r:teinr t:"T^ 

database. P^^^^^^^' ^^e information is added to the 

For the purpose of validating the chemical reactivity 
descriptor concept of the n^-c»«=or,^ • • eactivxcy 

Dref.n^oo • f present invention, a particularly 

preferrea initial target protein ic, r,i„^ = ,.v,- 

fr<;Ti u ■ ^. Procem is glutathione-S-transferase 

(GSTJ , Which IS chosen becausf» ii- h=>o k« 

^ because It has been crystallographically 

characterized with and without , i P«-Lcaixy 

witnout a large number of bound 
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inhibitors (37) . the level of expression has been measured in 
NCI's 60 cell lines (13), the activities of many known 
inhibitors are available (38) , and macroscopic quantities of 
peptides that bind to the active site are preparable. 
5 Other preferred proteins for early incorporation into- the 

database are ones that are clinically relevant, have known 
small-molecule inhibitors, and whose expression has been 
assessed by biological methods. Proteins for which peptide 
ligands exist and for which expression data are available at 
10 NCI include ras (39), src (40), and p53 (41), and other 

. ^ „ «ixxuii oocn Kinds of data are likely to 

be available in the future are the UL44 protein from 
cytomegalovirus, and hMDM2 protein that binds to p53 (41) 
Crystallographically characterized targets are of particular 
15 interest and utility in the early stages. 

A list of agonists, antagonists, radioligands and 
effectors for many different receptors appears in Appendix I 
°^ ^^edicina1 rhPmist-.^. x>..^^.^ .,, p....... pp. 

294 (Royal Soc'y Chem. 1994). Appendix II lists blockers for 
20 various ion channels (which are another special type of 
receptor) . The proteins set forth in these appendices are good 
candidates for inclusion in the reference protein database 

It wxll be appreciated that once a ligand is identified 
for a former query protein, it becomes a reference protein. 

25 Combinar orial T.ibrarif>fi 

The term "library" generally refers to a collection of 
chemical or biological entities which are related in origin 
structure, and/or function, and which can be screened 
simultaneously for a property of interest. 
30 The term "combinatorial library" refers to a library in 

which the individual members are either systematic or random 
combinations of a limited set of basic elements, the properties 
of each member being dependent on the choice and location of 
the elements incorporated into it. Typically, the members of 
35 the library are at least capable of being screened 
simultaneously. Randomization may be complete or partial; some 
positions may be randomized and others predetermined, and at 
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random positions, the choices mav be l^mS^^/^ • 

luay oe ixmxted m a predetermined 
^^er. rue „e^e.s of . co^inatorial uLry Ty L 
=l.g=.ers or polymers of so.e .ind, in „hi,, the variation 

S or T "^'"'^^ °' ^""^"^ "--^ - one 

te^s Of the connecting linkage, or the length of the oligomer 
or polymer, too. Or the n,e,^rs be nonoligo.eric molecule. 
w.th a standard core structure, li.e the i, 4-ben.odiazep L 
structure „.th the variation being introduced by the chLce 
10 Of subst.tuents at particular variable sites In the core 

structure. Or the metnh^., v, ,. 

^ , , iionoxigomeric molecules 

asse^led Uke a jigsaw pu..le, but wherein each piece has Ih 
one or „ore variable .oieties (contributing to libra^^ 
diversity and one or .ore constant .oieties fproviding th^ 
15 functionalities for coupling the • 

pieces) '■'^ question to other 

The ability of one or „ore members of such a libr.-.,-y to 
recognize a target molecule is termed .CcmbinatoTilr 
Recognition- . m a "simple combinatorial library" all^f .k 
20 members belong to the same class of co^ounds . g.', 

and can be synthesized simultaneously. l^Zltlll 
combinatorial library- is a mixture of two or more 

libraries, e.g., dnAs and peptides or- k«„^ ^- • 

t/c^uj-aes, or oenzodiazeDine anrf 

carbamates. The numhf^-r of ^ -^a-^epine. ana 

oc . number of component simple libraries in a 

2S composite library will, of course, normally be smaller than the 

average number of members in each simple library, as othelL 

the advantage of a library over individual syntlsis is sZll 

01iao^np^«:.r^^ide T.ihr-avi^.. 

An oligonucleotide libra-rv ■! ^ ■ 
30 -Library is a combinatorial library, at 

30 least some of whose members ar-^ • 

oligonucleotides having three ort"! T ^^"^'^"^^^^-^^^^ 
nhoor^h^^ • nucleotides connected by 

phosphodiester or analogous bonds Th« «t , 
be lin»=v. T Donas. The oligonucleotides may 

be linear, cyclic or branched, and may include non-nucleic acil 

moieties. The nucleotides are not l^'mit^H . .v, ^^"^ ^""^^ 
ic -,, J-imited to the nucleotiriiac! 

35 normally found in DNA or Rna Po^ « ■■ nucleotides 
modifiPrf 1-0 • examples of nucleotides 

modif ied to increase nuclease resistance and chemical stability 
Of aptamers, see Chart ^ in Osborne and Ellington, Chem. ^.t , 
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97: 349-70 (1997) 



length in bases Hence a t exponentially with its 

lik.l„ f„ K uT oligonucleotide is more 

likely to be able to fold to adapt ir==n= > 

on the other hand, while very Ion ' li , ' 
and screened, unless they pTotld! Ttuoh ^^^'"-^-^ 
10 that of shorter molecule! tZ ^"'""^^ " 

the selected popula L iT/r.""^ ""^.^^ " 

and Ellington 19.7, . Hence the I K 

invention are preferably "^Xa of oT"" ""^^^^"^ 
a length of 3 to lOO bases Tore f "-^"3 
15 The Oligonucleotides in a";ve„ lib^ " " 

of different lengths. °' '"^ ^^'"^ °- 

Oligonucleotide libraT-T^o u 
libraries of very high divi:: .e^ ao"! JeT"?, 
binding molecules are readily iplifled " ^ bTp^lt. ^ 
20 Cham reaction (pcr) . Moreover nn^i ■ P°-^y™e^ase 
.ave very high specificity ariifrrrt^^/rger.""- 

™: xriiiT lirr'ier; 7 

described in King and Fa.ulok, Molec Biol p . ' " 

25 (1994); L. Gold, C Tuerk ^' /^^'f^ " * ^^^^^ 20 : 97-107 

ligands, US.sS9;s;7; : e^ al "/^^^^^^^ ^^^^^^ 

The ter. ..apta^er- is confeLea'on Th '"V' ^''''^ " 
Which bind the target protein si ^^^^-ucleotides 
characterize the taraet ""^^ ""^'^ 

- i<^entificationo;theartaJ/^^^^^^^^ '^""^^ 

the apta.er and the prote") and T"'' '^'^^'^ 
ap.a.er as a ligand t'o l^, ^ ct^rr'^ " 
protein) . chemical reactivity of the 



5 



protein) 
Peptide r.-iK^^^^, 



acids connected vi. peptide bids, rp^m:; "bHir: 
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branched or cyclic, and may includ. ncnpeptidyl .oieties. The 
a..no acd. are not United to the naturally occurring amino 



5 not is one in which one or more (hut 

. 7 °' ""^ -''"-^ The 

In one embodiment, an internal r-o=-;^ • 

„ ^.^ internal residue is constant, so that the 

peptide sequence may be written as 

" I'n! '"'^^ "''"^"^ acid, or any 

fro.^;;" ™^;rr\r;o' T.' ' ^'^^^^ independently 

different .T«. • ' ""^^ "^^^ ^^^^ or 

for all ^" ^^^^ "^^"^^'^^ -ino acid 

IS Preferablv ''""^ -i^- 

Preferably, ^ and ^ are chosen independently from the range of 

Preferably, aa. is located at or near ti-e" center of the 
peptide. More specif icaltv h i- • ^ • 

^-^r^ Pecirically, It is desirable that m and n are 

not different by more th^r, o ~ 
on -, " ^' ""^^^ preferably m and n are 

20 e^al. even if the chosen AA. is re<^ired (Ir at least 
permissive, of the target protein ,TP, binding activity ^ne 
may need particular flanging residues to assure that i't I 
properly positioned. If i^ more or less centrally locatel 
the library presents numerous alternative choices for tl 
flanking residues. If aa -ic ^ tor cne 

diminished. ' " "-nihility is 

trw,/"! """^ ""''^'^'^ libraries are those in which M. is 
tryptophan, proline or tyrosine. second most preferred are 

st^ta^e ^T- ''i-^ine. arg nine 

„ . AA, IS asparagme. serine, alanine or 

methionine. The least nreferr-^^ r,u • 

cilvcine Tho . preferred choices are cysteine and 

glycine. These preference"? k= j 

results of erences are based on evaluation of the 

results Of screening random peptide libraries for binding to 
35 many different TPs. j-'xnaing co 

Ligands that bind to functional domains tend to have both 
constant as well as unique features. Therefore hJ 
"biased" peptide librari<.« Therefore, by using 

P Ptide libraries, one can ease the burden of finding 



25 



30 
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ligands. Either "biased" or "Tinhs==^^.. i -i. 

°^ unbiased" libraries may be 

screened to identify "BioKey" peptides for use in developing 

reactivity descriptors, and, optionally, peptide aptLer 
descriptors and additional drug leads. 

5 Descript-nrg 

A "descriptor" (also Icnown as a parameter, character, 
variable, or variate) is a numerically expressed characteristic 
of a compound (which may be a protein, or a protein ligand) , 
Which helps to distinguish that compound from others. A 

10 descriptor value need not be sh.=r.i.,.- . 

. ^ , ^ , '■"•■^■1' »P«'--i-tic CO a compound 

to be useful. The characteristi n, w ^ 
, . . ^<*raci:eristics may be pure structural 

characteristics (as in a "st-nir.h„>-= i ^ 

rpf^r•^^.v, ^^"^=^"^^1 descriptor") or they may 

refer to the compound's interaction u,-;*-v. ^ ^ 

^„ ^ , . . with other compounds, such 

as a binding interaction (as in an "3n^=.m<.v- ^ 
,c u • , v«=> j-n an aptamer descriptor") or a 

15 che^xcal ..action ,as in a "reacivity descriptor"". ..paired 

ir^r'dT," '^^^^■^^^"^ °* property as measured 

m two different molecule, a "d^c^n^^r.^^ 

A aescriptor array", "list" oi- 

■set., is an array. Xist or set w.ose elements a^ d e;ent 

descriptors for the same molecule 

A plurality of comparable descriptors for two compounds 
may be used to calculate a simi l , « ^ mpounas 
^ . similarity for the two compounds. 

The descriptors used in maWng this calculation in the present 
invention, for two proteins, win include (a) at least one 
Chemical reactivity descriptor, and/or (b) at least one peptide 
2S or Oligonucleotide affinity descriptor, and preferably both. 

The similarity calculation may optionally consider other 
descriptors, such as structnr-xi ■ 

as well. structural descriptors of know, ligands, 

30 soac* r "-^"""^-^'P"" ^^^i"-- an n-dimensional descriptor 
space, each compound for which a descriptor set is availLle 
^y be said to occupy a point in descriptor space. The 

between the two points which they occupy in descriptor space. 

A Similarity measure or coeff 

w , coetticient quantifies the 

35 relationship between two individuals / 

val,.^« . . inaividuals (compounds), given the 

values of a set of variates (descriptors) common to both 
Similarity coefficients are usually defined to ta.e values in 
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the range of 0 to l. 

one commonly used measure of similarity is the product 
moment correlation coefficient. its correlation is unity 
whenever two profiles are parallel, regardless of how far apart 
they are in level. Two profiles may have correlation of .1 
even if they are not parallel, provided that the two sets of 
scores are linearly related. 

For binary descriptors, the simplest measure of similarity 
IS the simple matching coefficient 

^ij = number nf niatches 

number of comparisons 
The Jaccard or Sneath coefficient modifies the simple 
matching coefficient by ignoring bits which in both i and 1 are 

IS 1"°;/""'.'''' negative matches (mutual absences). 

15 in other words, it is obtained by dividing the number of bits 
which are set in both descriptor bit strings, and dividing by 
the total number of bits set in either descriptor string. it 
also called the unweighted Tanimoto coefficient 



20 



The weighted Tanimoto coefficien^ for- • ^ 

. , . . , , . »-oerricient for descriptors k and 

individuals i and i is- r- _ ^ 



k 



25 ^"«x„+rw,x„-r„^x„x„ 

can bf.T ""k""^ ' Similarity coefficient which 

can b. used for binary, qualitative, and quantitative data: 

P P 

30 ^"'kit'lZr "'^"""^1= A and i and descriptor k. 

k J/l 'n " ' " ''""■P-'^i-- valid for variable 
and to 0 otherwise. If „,„,o, then is 0. For binary 

data, Wij,, and s^^^ are both o if the vardahio -io 

K • J. . " Ai- tne variable is negative in 

variable xs pos.t.ve for both individuals. For qualitatiZ 
data, s.„.l .f the individuals are the sa.e for the kth 
charac er, and s,.=0 if they differ. Por quantitative data, 
=i,«-l-|X„-X„|/R, where x„ is the value of descriptor t for 
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individual i. and R. is the total range of variable k 

Descriptors may be .quantitative or qualitative 
Quantitative descriptors ™ay be integers or retl nl^ls' 
Qualitative descriptors divide the data into categoriar^Lh 
S may be but need not be, expressible as having r lit i^e 
magnitudes. Binary descriptors are a special case If 
qualitative descriptors, i„ „hich there are Just t^o 
categories, typically representing the presence or absence of 
a feature. Qualitative data for which the variates III 
0 several levels may be treated li.e binary data with each letl 

of a variate being regarded as a k.- 

an eight level variate expressed as eight bits, . Cr the eveU 
may be numbered sequentially (i.e., an eight level varil^le 
expressed as three bits) . variaoie 

' form ^^Y^l''''"''^ descriptors are preferably quantitative in 
form. If the aptamer descriptors are expressed as nucleic acid 
sequences, with or without secondary structure and prote n 
contact information, they are qualitative in form. If thiy are 
expressed as fingerprints, they are a string of binary data 
oower-s coefficient, for qualitative data, onlyTrelits 
exact matches of the variate. Por aptamers, it is more u^efu 
to evaluate the similaritv of o usezux 

T . i-L-Larxcy of the sequences by a BLAST tvrje 

analysis rather than to simply state wh«^K 
same or different. ^^'""""^ 

a mett^"'"" """" " similarity measure which is also 

dx,y-0 af x=y; d(x,y).d(y,x); and. (iii, 

d(x,2)+d(y,2)^d{x,y) (the metric or tri.r,„ i ■ 
rt* ^ , '=i-tic or triangular inecrualitv) 

Of cou-a, the greater the distance, the less the simUarity.' 

any Of sev. r "'"^^'^^ transformed, for 

any of several reasons, including: 

(a, to reflect the perceived value of the descriptor for 
determining whether two proteins will be modulated 
by structurally similar drugs- 

(b, to reflect the perceived 'reliability of the 

descriptor data; 

(c, to correct for differences m scale between 
descriptors, so that a descriptor does not dominate 
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Lf tir'' " ''"'"^^ calculation .erely because 
xts values are of higher magnitude or are spread 
over a greater range; and 

(d) to correct for correal a w ^ 
c „^ • correlations between descriotors 

transfonnea prxcr to use in calculating distances. T^icai 
transformations are (a) nr-e.c!«r,r.^ /, w ^- typical 
/ N ^ presence (1) /absence (o) , (b) infv+i^ 

^'i.^Tr^^i:::;^:.;^' - - - — 

^ ' ' variance ^v'-v/^r \ 

::ri"irtre r — 

proteins is identified. Descrlctorr °' 
protein in the set. A tr^III" ' T ""^^""^'^ 
are also tested against eal c 1 ^""'""'"^ 
co^unds are chose! so that L3 ^"^'^ 
^0 at le t one compound ^JZ ra^^ ar^-^^t: 

::ed rricrr\ir:tr:r^r -^^^^-d^^ 

protein, using the calculated ""^^ 

example, it wi!l calculate , similarities. Por 

S other proteins, then relt the °' '° 

against the other proteins a U ""' 

the activit. o. . Jco^o::d:\gatT;oti:: V: T;r 

repeatedly .ith each protein taking ol the rol^e .ZLIITI: 

appix tl^IL arrerLt;t " "^^^^^ "-^-^ " 

Kauvar at al.. the ^^o^J^:: ^ 
compounds inst the guer. p^tein Z^Z:" 

e.g., insignificant secmence cSm,-!.. .^vcjrsicy, 

biological activities. Ttl plan Ir^" "''""^ 

the plan is to use the library for 



wo 99/06839 



PCT/US98/15943 



32 



there xs no information available about the ultim.^. 
significance of a descriptor on. • ultimate 

descrin^r^^o u ""ay give a greater weight to 

descriptors which have a laroer rv =r,^ u 

distribution. ""^""^^ ^ ""if<^^ 

""^^^ emphasized that we do not require use of 
10 weighted descriptors let ^ 

deriving weights °' Particular method of 

It is likely that some degree of correlation will exist 
among the descriotors c;t-=„^ j , wzxi exist 

Cluster analysi^ princ """^"^'"^^ « 

- lea. J. a^ai::: " jr "r. r^^e ^^""^ 

descriptors are strongly correlated !nd t T 

- ^ ~ 
.a„do., . — . ^^^^^^ 

one way of correcting for correlation among the 
descriptors xs for each descriptor ^. calculate the ^^ Tf 
Its MusEsa correlation coefficients v„>h . aisaas of 

25 (including „.n, for which the „ descriptors n 

, . i-or wnich the coefficient is necessai-Hl« 

unrty), and subtract this nu^r fro. one to obtain a llghl 
representing the fraction of the vari;,f*„„ ■ T ^ 
Which is not explained by the 'aver«et 7 " ^^^"'^"^ " 
this ..average r=.. method, if we havl Zr f 
>0 ar perfectly correlate, to .achr^ernidtrr-ptrs at 

weights of 1 Teach "^"^ ""^ 

B varietToTZt"' °" -^-i- °* °^ ^ 

variety Of drstance measures known in the statistical arts. 

The most commonly used distance measure is the Euclidean 



metric : 
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k 

It -responds most closely to our intuitive sense of distance. 
The absolute, city block, or Manhattan metric is 

k 

elr"r''\" " =-1- "nits Of 

Z t Z """" "^^^^ ^''^ -^i"— "l^^her 

.0 one and three on the other. p ^i- on 

The "cosine theta" distance ic t-u^ 

between the vector fro. the origin to poinV7';„r.r 

. . ^ point Xjfc and the vector 

from the origin to point X^j,. 

A generalized distance measure is the Minkowski metric: 

^ d,3=(E|Xi,-X.,|'^)^/- 

k 

Which is a Euclidean metric for r=2 and a city block metric for 

The Mahalonobis distance measure (D^) is of the form 
• d,,= (Xi-X,)' Z-UX,-X,) 

where E is the pooled-within-groups vari ^nnc 

and y ^r,H V y^oups variance -covariance matrix, 

and X, and X, are the vectors of scores for entities i and / 
The Mahalanobis distance allows ^ entities i and j.. 

vari«hl.« correlations between 

variables; if the variables are uncorrelated, is equivalent 
to Euclidean distance measured using standard variables 

The Canberra metric, given below, has the advantage of 
being unaffected by the range of the variable: ■ 

A modified form, which accommodates negative states, is 
dUJ) = E (|X,, - X,,I/(|x,,| . ix^j,,. 

A and a, the distance is the proportion of the entire set 
(excluding 1 and ±) that have descrint^r- ■ 

^. ^ ^ . aescriptor states intermediate 

between that for i and that for -i fo>- 

descriptors k. °^ ""^^^ °^ 

A distance measure may be transformed into a similarity 
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10 



measure by any of a varl(^^v r^•p 

y i- a variety of transformations that convert ^ 
non-negative number to the range 0..i, e.g., 

Si3=l/(l+di^) 

Y""""''^ -"-"-^ into a distance by. 

If there is a theoretical maximum distance (d 1 
on the theoretically possible ranges for each of th ' 
aescriptors, the similarity may be e^ressed as 
Sij-l-(d„/d.^) 

pairs^'n'r"'"' ""'^ '-tween all 

pairs, and then use thp. . " <iJ.J. 

— w-ax uiajtimum ai stance (d ) • 

S,,=l-(d,,/d_) 



30 



35 



Instead of usino the ratio .-u 
actual . ''^^ actual distance to the 

s r^r^Tr^paTrTfrrr- - - 

Of der ^j"""^ °' ' °' compounds, as measured by a set 
Of descriptors, may be calculated in several ways 

P a sweeps out a hypersphere in descriptor soace th. 
hypersphere having a radius known as th. 

Th«a 1 '^"wwn as the similarity radius 

The total hypervolume in descriptor soar. • ^^^^"s. 

unit similarily radius of ^ 
calculated ThisT. °" " °' compounds is 

nS Tl compared to the hypervolume achievable if 

none of hypersphere ' s overlan- -i » k cnievacie if 

hypersphere, where n is th! u" ~ * °' " "'"^^^ 

The swLt f """"^^^ °^ compounds in the set. 

ine swept hypervolume may be deterir,-.- t,^^ 

carlo methods. The ratL of tH " """^ 

maximum hypervolume is mealre of'To'' " 
ranging from 1 .maximum, to r/nlmrn^rr "-"^'^ 

Another approach is to calculate =n * . 
distances between compounds ir descriptor s 
distance is a measure of diversity fT T 
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5 



the nu,^er of clusters arbitrarily, but rather decides the 
number based oa some goodness-o£-£it criterion. The resultino 

:rt:: ^ -^^^-^^ °^ t the r t : 

Of the nun*er of clusters to the number of compounds 

one may calculate a measure of disorder for a descriptor 



H(k) = - E m P,„ 
g=i 



10 



T7s tL" T "T"^ °' - descriptor and 

P. .s the observed proportion of individuals exhibitina state 



H(lc) for all k IS a measure of overall diversity stand^rrt 
techni^es may be used to correct for correlation 



Of boTiToe ''"^^'^ °" complementarity 
=f bo.fi shape and zunctionality. The functionality in the 
b.nd.ng srte is an array of recognizable groups <hydrophobxc 
hydrogen bond donors, hydrogen bond acceptors, , stacWng' tha; 
20 .s complementary to the ligand „any of these functional 

groups are reactive towards common chemical modiflcaUon 
reagents, such as hydroxyl radi^;,! m - moaarication 
sulfate (70) ,nH ^ . radical, MnO, , NaBH,, and dimethyl 
suxrate [10) , and towards "desion*»-i-" -^^^ 

■ . designer reagents usually based on 

transition-metal complexes (ij 12) Th«o 
5q ^. , ^-L-i,J.^). These reagents can react 

25 with functional groups via a w-irJ^ ^ ^ reacc 

^ ^^"^e of pathways, includina 

»nrt =11, , J ^ °"*-*i^"ron reduction, hydrolysis 

and alkylation. The array of functional „ ■ 

Bite, „iU therefore exhibit a unZ " " 

30 activity to each of these re gents This 1 °' 

orovidP ^ ^ . «^3encs . This array of rates will 

thrbfnd P"- for the chemical functionality of 

m . Of descriptors can be 

obtained by measuring the relative rates of th» f ► , 

groups in the binding site a^i. T functional 

35 modification reagents. " ^^^^ °* ^''^^^l 

It may be the case that most i-u^ 
<r ^ . , inosc ot the solvent accessible 

functionality is only in the binding sit^- v,^ accessible 

uinamg site; however, it will be 
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useful to teo„ that the reactivity descriptors are binding site 
spec.f.c. .or example, the structure of ly.ozy.e shows t^^ 
there are two very important tryptophan residues in the hindr^g 
Site, so It will be important ^h = ^ 

r , 't^w-t^tant tnat the measured rate is for- 

5 those and not for the oth^r 

tne otner trp residues in the crotf^in or, 
the other hand, the binding site tm Protem. On 

, ^ ^ ^^'^^ residues are by far th^ 

most accessible and will r,^r^K=K^ ^ . Y ^ar tne 

reactivity, .he ^^siest^ Vr::^ in^li^ 

in rates is Len the rea v "! 'T.' 

Alternatively, the che.ical ™odIfic:;io: re^an^t 

M^LTiLr"^^ " - --="vTo thi 

15 The presently preferred reaoents ^ 

complexes -eg Pt (oon) - /f^^^"^^ transition-metal 
who! . Pt,(pop), , Ru(tpy) {bpy)o^% and Ru (bpy) 3» - 

whose rate constants can be measn-roH k • ^"P^'j 

or stern-Voln,er quenching spectroscopy 

.0 .u,tp Hhpy,o^- oxidi. :r:::;rac:~ e i^ryU:: 

from the deoxyribose rino anri "yarogen 
J' ring and by a somewhat more ef f i r-i <.r,i- 

pathway involving inner- sphere oxidate o . efficient 

- 2 2' K^r. oxidation of guanine at C8 (bpy 

- 2,2 -bipyridme, tpy = 2 , 2 ' , 2 " -terpyridine) U8 19) The 
oxoruthenium(IV) system has been very useful in d ' 

Of all oxidizing equivalents can h. '"^ 

;"rr.".rv— E:H=— ^^^^ 

relative r u k . toward exogenous oxidants and the 

relative C-H bond strencr^he ^.u 

strengths of the sugar hydrogens (is js) m 
addition, steric facto-rc -i,,^! ^ a i=> \j.a,±^). m 

sphere guanine o^!::" n act™Vf"'"^""'= " 
5 been carefully studied i20) IT ^'"^ """"^ ''^"^ 

toward 0» is'su„„ari.ed l- Z^^t^ ' " 

Via the reaction of Fe.E^lT- 

With hydrogen peroxide is a 
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powerful method for imaginq structiir-«= «^ 

. , ^ ^ structures of unusual DNA's and DNA- 

protem complexes (21). Anaraii^T , , o 

^ ' A parallel approach to the Fe(EDTA)2- 

/H2O2 system based on the tetraan-i ^r,^ ^ , \ 

= POH^-l h.. H ^ tetraanionic complex Pt^Cpop)/- (pop 

i'jOsHj ) has been developed (22-24) tk-:^ -i ^ 

q v,„j^^ ■ ^"^s complex abstracts 

.11!"- °' ^'^"^""^ ="^""tes upon 

Photolysis. I„ auplsx ™a, hydrogens are abstracted fro. the 

4 and 5 positions of the suaar- ^r,^ ^ 

. sugar, and electrons are abstracted 

oTptTpor*- . 

or i'tjlpop)^ can be measured bv «?^o•K■r, ^r^i 
in . -'•urea oy Stern-Volmer quenching of the 

10 emissive excited state (23) . 

Sutheniu^rilr; complexes. The rea=t^„.^„ 

available to the oxoruthenium(iv) and PC ,r,„n,"- '^"T ' 
inol.de Inner-sphere reactions „here ttre":':: Ll=r 

. ri-a^i :::rte^:\:™^ 

above th«f , reaction pathways described 

-^tlr H " -""-"-n from nucleotides 

- outer-sphere pathways are ones where only an electron is 
transferred by tunneling fro„ one reactant to another xL 
advantage of these pathways Is that there Is a spherically 
0 symmetric distance dependence to the reaction probability wmL 
inner-sphere pathways generally re<^lre a specific approach L 
^he reagent on the reactant. This difference will be Lportant 
here in defining a diverse se^ • important 

3 uiverse set of reaction pathways of th. 
rea „ ^^^^^^^ ^^^^^^^^^ ^^P^^ on Kuf^py, 

ea t with substrates only via outer-sphere electron trals r 
Therefore, unlike Pt,(pop),<-, „hich can oxidized 
substrates via both Inner-sphere hydrogen atom transfer "d 

toir; r. ':in::r7^rb:TZset"of r"°"^ 

Pt.(pop,.-. „e have used electron trlsf " 
guanine and Rulbpy,,- as a vel reactions between 

accessibili^„ ^ ■ ^ sensitive probe of solvent 

2" I cb H D»A-protein complexes i27- 

ZnllV" " P^^--^ - slngle-base 

mismatch at guanine or base fliDDin« = • ^ 
the active c-!-. ^ i^iiPPing of a paired cytosine into 

uue accive site of a DNA -rpna-ir- 

usiA repair enzyme. The absolute rate 
constants for reactions of Ru (bcv) o.r, k 

„„„ ^ KuiDpy^j can be measured by optical 

spectroscopy, as with Ru (tpy) (bpy) 0=^* Y optical 

Re.ctl.,,y profiles. The amino acid "reactivity prof iles" 
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will be determined for each nf ^ 

eacn of the proposed reagents. The rate 
cons ants fo. t.e reagents .ay .e determined wit. any or 21 

These .oTtT?"" ^^^'^ ^-o. 

These 20 rate constants will then provide a profile for each 
5 reagent. The desired result is th^t ^v, • 

Drofil^ for- K ''^^''^ ^ different 

proriie for each reaaent 

, eagenc. For example, if Ru (tpy) (bpy) O^* 

^(bpy), reacts .osUy with one electron donors U.e tyrosine 
and tryptophan, then each descri r,.-,,, . j yrosine 

10 hetween the two descriptors ^ u he info . ^^^""-"^ 

^ " ^^^-^ informative. The rate 
constants will be measured for ^h« ^ • 
. . . free ammo acids anrf 

nd . " °' ^-^-P-Ptides to Show that the' 

individual rate constants add together linearly and can be 
discriminated by the three reagents. 

15 The three preferred reagents anri ^v,^^ 

well suiter, ^^ ■ . ^^aencs and their reactivities are 

well suited to discriminating the 20 P.ms„^ 

Pt fnor,^ K ^ "° acids. For example, 

Pt,(pop), abstracts hydrogen atoms from weak CH bond. • 
organic substrates. Shown in Fia r« , 
acids with the C-H bonds likely to be ^^'^ °' 

20 highlighted. The C-H actit ! ^ '''^ ^ 

y g a. Tlie C H activation chemistry of Ru(tpy) (bpv)O^* 
IS distinct from that of Pt2{pop) ^- in ^h,^ 

^, . 2vpop;4 m that inner-sphere adducts 

of the ruthenium-oxo linkao^ a-r-^ ^ uut.ts 

favors activation of aTcor^ "^"^ 
Thus Ru(tnv»?H ^n2. • ^ aliphatic functionality. 

Thus Ru(tpy)(bpy)o likely to prefer serine, threonine and 

25 cysteine more than Pt,{noD) _^ ^ ' -^^oiiine, and 

trvn^or,K .u. • ^^^er-sphere chemistry on the 

tryptophan and histidine rinas or- hov-«,' -. • r i <-ne 

rings or terminal amines of araninin*. 

and cysteine 05.3«, Id " " 
30 POSsihilities„ithR;hp;,3^ „«ton- 

extents can be expected ;y ' airthTee" " 
r,= *.K oy ail three reagents (lo) . These 

pathways «11 he strongly dependent on the folding of the 
protein O., , which should give excellent specificity The 

35 Z ir'"'" '''"'^"'^ wiXl also he oh= Jed 

35 to some extent with Pt2(pop)^*-. ooservea 

peptile! °' ""^ ^= "sted using 

arrav tf " a large 

array Of ,,,,, approximately the average for all 

Of the amno ac.ds separately, weighted by the solvent 



4- 
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accessibility of the individual groups in the folded protein 
Therefore, the rate constant for a dipeptide should Te the 

:::T.iT t ^^^-^ ^-^"^^^"-i a.i„o icd 

stLt H , " « - -condary 

structure develops to attenuate the reactivity of certain group^ 
that are protected by the tertiary structure. It „U1 be 
instructive to measure. ^ 

measure the rate constants for some 

representative rando™-coil peptides to show how sin,ply lin,"" 
the a^no acids modulates the observed rate consta^tl. " 

The rate constants „ay be measured for the three 
rr^rTrA" conditions: denatured 

. acia. The objective is for the 

difference .n rate constants for folded and denatured pritein 

' folded" V'"^"""^^^ °* the surface area of t"e 

folded protean (weighted by the solvent-exposed amino acids) anl 
for the rate constants with and without the bound BioKev 

molecule to give a fTU3n^ s • ^ i^ioxey 

in the binding site " 0! '"^"^'^""^ ""^ "ids 

aing site. A complication is that the bound BioKev 
may present new solvent-acoea«<hi. ""no tjioKey 

f„„,.H- , accessible residues that react with the 

transition-metal complex. To test for this complication BioKev 
pep Ides with a scrambled amino acid sequence will be i;ci:del 

functr "r."°" - «-tive 

unfolded proteins may be generated chemically by addition 
of a denaturing agent, such as "ition 
hydrochloride, we have sho«, elsewhere that" ^T"'"'™ 
deactivate our reagents 124^ """''"^ does not 

denaturant is problelatt, w 'will "iT "^'^"^"^ 

- c „^ have sho:-rrriTd~ji:: 
eiTtrrd;:::enr:::" — - 

The ra^. . ^ '^^^ ^l^^nge in biomolecular structure. 
Ane rate constant for* ^ . , 

separatelv .nH . Peptide may be determined 

separately and used to correct for- 

DeDtid« u ■ °^^ect for additional oxidation of 

peptide side chains not blocked bv . ■ 

•"j.wuA.ea oy the protein Tho 
constant for a peptide with th^ ^ ■ o'^ein. The rate 

scrambled « ^^'"^ ^""^"^ i'^t with a 

scrambled sequence, may also be of interest 

3 iTtT'^T^r^" illustrated in Figure 

3. in the unfolded protein, all of the amino acids will be 
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accessible to the transition metal complex m th. . 1. . 
protein, only the surface residues .ill h 

reagent, in the ligand-bound foiT/ — ssible to the 

residues not occluded by thTligand ' ^"^'^^^ 
S Mnding site) .ill be ^^bfelo^Vh^ ."^^^^^^^^ 
The protein oxidation rates :,r-^ 
quantifying the disappearance" the " ''""^^ 

analytical methods, q^LtitaHn, the ^ """^"^^ 

P^otein „aas spectlet.rrnol\;i:X:r:L" 
LO fluorescence of probes that binr^ ^ ^"^^y^is, changes xn 

pia.i. that 

is a new quantitative descriptor cf the protein 

this :Al7e~ra rtiir: - reagent, 

measures the rate of ZZZZ 27 " 
or the rate of appearance o/ the producf^rr" 
reagent on a particular a^ino acid Ih! °' 
I a.inoacid-ape=ific. or all o^ tH 

to generate descripto^ approaches .ay be used 

accesstiirty^of^eacT re^^'d "^'^""^ 

^tructure. s^ the dL I f ir rT^^f " ""^^ 
unfolded proteins will give a "tit ^ 

degree c£ folding and of the th" '•escription of the 
solvent-accessible surface When T;"'- - 
to the active site, the active arte r'es^d"'" "^""'^ '^'^ 
The difference in the rates "ith , f " 

therefore be a ^uantitatte descriptor"^::: "l"""^ 

Of residues in the active site. """^^"^ 
The collected data are ua.rt f„ _ 

containing in vitro descriptors of prot "" ' '"'"'^^ 
protein/reagant combination the databa. '^^"^ 

relative rates in the unfolded folded ! T"' " 
The relative rates are nor.aU,ed tn /lu' t"'""^^ 
and entered into the database in 7 three . " 
Where each point corresponds to a parti ? 

protein in one of the three ^t^tl ^t" T"'" = 

States. More dimensions may be 
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added by including amino acid-soecif in r-=^^ 

^P^'^^^^c rate measurements surh 
as surface hydrophobicity chanoes , ^ents, such 

exhih^t- 1= u Changes. in general, proteins that 

exhxbxt large changes in rate from unfolded to folded states are 
those ..th compact structures and a large number oTZTel 

BioKey-bound states indicates large active sites Th.,. ^ 7 
Will be Observed as averages across the entLe^set f^rea I^^^^^^^ 
So geometric features win ^^ayenus. 

^ures will be apparent as averages for a7 7 
reagents. Converselv H-i^^^ -^^yes ror all 

10 will rel«. .„7h ■^^""^nces tetwe^ individual reagents 

wxll relate to the composition of the protein surface and active 
Site, because so.e residues that are protected win u. 
.owaros so.e reagents and not others. Thus, the data'b;;:^: 
provide considerable irt^r.>-m^^* -^acc win 

j-vACidijie inrormation on hr^t-v> *-u^ 

composition of ligand Mndin, site. , !t geometry AMD 

15 particular compolds, . ' """^ 

The three reagents described thus f=-r r-„ u. 
by substitution and still H ' I generalized 

Characterization. For example ' '° 

can be substituted with , 

. substituents -arrL^re: rrdrLrfheT- =^"T"' 

o.idat.on and hence the .eactivrtr::/?.,?-- 
different groups in the polymer. Por example, „e showed that 
a very electron-rich derivative, Ru(bpy,,,4-«„e -pyio" '^iv 
abstracts r hydrogens from thymidine sujais anT^of rom A c 

ii: ntiridt T:r --^es^an 
substituted wit^ icaiiy dift::«":: 

Reactivity of one-electron hydrogens (20). 

modulated by electron reTeaT °" ''""'P^''" 

.a.ing the compaeHr^i: "u^ Z "'^"'""^"^ « 

Change the electron-LansfTr i ZTr^' T 

changes should modulate the -eacr^J ► ' """^ 

Of solvent accessibil ty s'k 0^ ^""'^ 

reagents that return dif ferentled de^ " ""^""^ 

targets . "rentiated descriptors for the protein 



wo 99/06839 



PCT/US98/15943 



42 



The transition-metal reagents are attractive because of the 
facxlity with which they can be modified and the ease with which 
absolute rate constants can be measured (I7) ; however, mining 
the potential information available from the wide range of known 
5 chemical modification reagents requires general methods for 
measuring the relative rates. As discussed earlier, chemical 
modification reactions have been much more widely applied to 
study of nucleic acid structure than to study of protein 
structure because of the greater lability of the phosphodiester 
10 backbone. Indeed, reactions that modify dna nucleotides lead 
to strand scission if not immediately than alm.ost always after 
base treatment (43) . a very sensitive method for measuring DNA 
. modification is by plasmid isomerization ,44.22) . This method 
xs therefore also a sensitive method for measuring the 
15 instantaneous concentration of the modification reagent. 
Therefore, competition of protein targets with plasmid DNA for 
modification by the exogenous reagent may provide a convenient 
means for sensitively measuring the relative rate constants. 
20 "^""^''^.""^^^^^^^ approaches involve, detection of modified and 
20 unmodified protein by MALDI mass spectrometry (45) or detection 
Of unreacted reagent by HPLC with radiolabeling if necessary 
The approach described above can be applied to other reagents 

sulfate, Mno,-, NaBH,, and hydroxyl radical. Diversifying the 
25 list of reagents will ensure that all twenty of the amino acids 
are assessed in the resulting descriptors. 

A parallel approach described below is to assess the extent 
Of modification of different families of amino acids. For 
example, the change in surface hydrophobicity of a protein can 

naUhT" ^ ^'^^^^ fluorescence of 8-anilino-X- 

naphthalene-sulfonic acid (ANSA) (45) a„ • 

TnxrA^^^x. w v*"^*/!/ K^b). An mcrease in 

hydrophobic.ty measured this way is observed upon protein 
oxxdatron, which is very well correlated with formation of 
rr'T. oxidation of methionine . 

3S Hydrophobrcity and other methods that monitor modification o£ 
specxfxc sets or subsets of amino acids, such as changes in 
reactive carbonyl group or formation of oxidized methionine 
(36), wxll give descriptors that are distinct from those 
measured by disappearance of the protein or reagent that give 
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the reactivity of the entire protein. „e can therefore envision 
more dimensionality in the ria^nK= envision 
s = H • • . . . y m the database where the detection method 
IS diversified and yields a discrete set of am^n. 
^^i.^ ^ ^-tci-c t,ec or ammo acid-suecif ir 

rate constants. Determination of fh= , ^Pscitic 

5 profiles for each reagent will tlen be — "vity 

assessing the difference in tLse lllr 'T"^"' 

.escriptors for .^.erent^lLitrrJ^rgtts^^ """^-^"^ 

the rLrcare™- 1 " "~ ^--^ 

constants for reaction oT he tLlrT" "'^ 
These „etho.s are aescribe. briefly here " 

allows the absolute rite of r 'th " ""^ 
- These ™eas„ts=^"cr TZ2 "ro:™^' 
spectrophotometer or, for particularly ""^^ 
rapia-scanning stopped-flot apparatus U., ^"x^ ^ 
complex mechanisms, the returned timll °' 
can be subjected to factir , '^"•'-^-P-ndent optical spectra 

complex reLt-:: ratlryT a/.^." TJlT 
detailed mechanistic information n.L 
the method of initial rateH be 
the reactivity of the amino Il L peptides"' T 
one-electron reagents based on RuZ >. cln ""^ 
>0 by changes in optical spectra. ' ""^'^"^ 

e^cciteTstirr" f--^*-^- in reactions where emissive 

excicea states such as Pt (T^nsi^\ 4- 

interest th. ,k i . ^^^(pop), react with substrates of 

voimeT:;erhrort\re:i~T" t~ 

5 emission of the comm « • ^^^^ method, the 

concentration " thr^ TlTf ' °^ 

absence of th. ^, k ^'^^ ^ °^ the emission in the 

<a«s>eiice or the quencher (jo) div-irj^^ w 

presence of the bencher I, is llntt! " T 

the „er , tQ, ) according to: ~tration of 
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(1) 

v^here . is the emission lifetime (lo ,s) for Pt,(pop)/- and . 
xs the second-order rate constant for the reaction of Pt^ (popt - 
with the substrate quencher. *'^2tPop;4 

method for measuring the ra^^» 
constants by competition with nl;,«m,-^ • • 

Pl™^o n plasmid isomerization is shovm in 

Figure 4 . Reaction of the i-*:.arT«>r,*. * • "^jwii in 

_ ^ Pl^smid is as much as 4 VK 

IS desired. when the protein ^ ■ constant 

be pe, forced for any reagent that ^ 

acids. modifies both DNA and amino 

A drawback to the approach in Figure 4 is the that the 
assay cannot be performed for DIIA-bi„dH„„„ . . ^ 
20 Simplicity and low quantitTL !f T '''' 
attractive. A more general It . 
oxidi.ed protein Jlrr^ra s s^dTaV":" " 
<«ALOI, .ass spectrometry Usr ^Ts :'"?""" 
detect small changes in mol cular we It of la " 
25 0.1 pmol of protein (for a recent ZltL T °° 
Simplest case, the mass spectrometT'il; ' 
detect the change in the conLntrarirof" the 

as a means for determining the rate of „,«^-*- • ^ 
reaopnh a • , a "-^e rate of modification by the 

30 ::r:o„i: b?:: m^r::: trrtrt;;—- 

during reaction with the proteiT ^hL ITT"" 

traditional analytical chemist "tech " 

for suitable substrates. xl^use of T " 

reagents .ill provide the desi^dTen/it "ity" ""^^ ^""^^ 

rates'utrrthtTha;:""^" ^'""^^ 

xa x>e rather than to measure the total rate for all «f 

~:i:o"'^.'d'° ""'^■"^ sample a sls° 

the amino acids. Such assays would involve quantitation If 
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S specificity in ..e Jsc™™'; k ^'"^"^ ^■"""^ 

individual protein information. por k 

crystallographically characterized protein and " 
reactivity desciptors should be predlctrh^ . 
10 di„ensional structure and .7 "^""^^"'^^'^ 'hree- 

descrihe. the inhere:: ch 1 rrea^ritltt ^"f": 

uowaras cne reaaent Thno ^» " ^^-^^ 

be used to „ei/ht ^ach a^ino ac d rr^""""^^ 
in the folded protein which . =°"''^«^=«"ibility 

L5 inherent reactivity. The ahilitv f "^'^"'^ 

carefully „ili .epe„d on act e eVct T ^^'^''^^''''^ 

raL^^at :e\rre:rtr rrcit: t r"^" 

0 the reactivity should be predraM , 

structure and the reactiCuy pr^Ue three-dimensional 

The amino acid comnoeH j= 
surfaces Has been der^ ri„"s:7;re/^"7 
structures with bound ligands are . P"'""-- "l-ose crystal 
that Trp. His, Ar,. and Tyr /rrrcr ^^o" 
sites contacting bound lijanrthrin /eLriTL" """" 
Gly and ser are often found near the h . , 
however, these residues are general v ab "T " 
protein. Therefore, reagents th" at. ''---'Sl'out the 

Which are found at ver^ ll frel! 

Site, Will be very infXmatlv ~tir""^ ""'"^ 

Rultpy, ,bpy,o-. Which oxidizes rZ lT 

alkylate ring nitrogens, or oxL^tT thatT"' 

ring nitrogens. X^rtantJv tt . " 

found on protein surrce/ ffa ^ ^IT"""" " ^^"^ 
found on average, „hereas " "^^"ributlon 

Sites is .ra^afi^altl^f erelt '""^^ ""'^"^ 

residues protected by bindl:; o llTXLrr'^ "V 

very different profile from th^l ^ "^"^ = 

from those protected by folding of the 
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Databas e conahructinn 

contJ"' ^° 3—'= - database 

trom iMci s S, A, and T databases. 

reaaen^ ^"^"'^-^-^"^ =mb=di™ent, for each protein and each 
reagent (or reaction condition if » •, 

under several different rZ:l^:^rZV: 
" --"-X =ontains information regarding reacUon 
uu.uxaea, rolded, and ligand-bound states p.^. 
provided for several different liaandH . -°nstants may 

• ^^^^^ent ixgand-bound states, i.e with 

different ligands. Preferably the r.r. ■ / 
expressed in relative terms anrf information is 

15 to one The rate ^"'^ ^^^^alxzed on a scale from zero 

ioZ e a I be expressed in difference 

torm, e.g., by providing the (rate fr^ir^^^ ^ -^ence 
1 • J ^ trace rolded-rate unfolded) ^T•a^a 

Ugand bound-rate folded,, ,rate ligand bound-rate unfo Jed, 
and/or (rate ligand i bound-rate Ugand 2 bound, . ' 

- ^^^^^^ —oL a-ar\^-::::rf:: 

a given protein in one of the thr-«» . . reagent for 

^^r^. . three states, although its most- 

efficient representation is likelv to . . 

database. ^ ° ^^^^ °^ ^ relational 

in a preferred embodiment, the various databases are 
25 normalized in accordance with =^ ^ ^ aacaoases are 

practice to .ini.i.e the %: Lattn If "T'^'- "'"""^ 

""Plication of information. 

this " ba::\fin:rtara" " ^-■^ — - 

contain additional i^^^: io\ Jtln"" 
30 Chemical na.e of the reagent, the reaftiorconriUons ^e^ ^ 

temperature, solvent, , and assay method Tf . . ' 

under many different reaction ^ " " ^ "^"S^""^ "sed 

to also create a reage rdatl r^:" " " 

to an. the reaction and rellll, dat.b 
35 about the reagent placed in The o.."' 

database record " ' "'"^ °^ -^"^ «>3»t 



Another database may tabulate, i-^^r,^^ 
in this dat3h,=o ■ , ^^^"-Late target proteins . Each record 
in tnis database will contain a taraet in ^r,,^ 

i-argei: ZD, and may optionally 
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aTit :tTT\ ^-^-ation a.ouc the .a.,e. proe.in. such 

as Its name, biological act^v^^w 

database could provL L lTZ7' * ^"'^ 

Of the "State, database .In , ™ 

add.t.„a. .eids .h.ch rdr./:;;,:. r^?. :rai.::: 

a lookup field for retrieval of . ° ""''"^ 

database. ' information from, a ligand 

10 The reaction, protein, and protein stat^ H.^ k 

relationallv linked tr. databases are 

any linked to the reactivity database, u,^^v, ... 
.eaccxon XD used for lookup of a reaction and the target ■;;;t:;n 
ID used for lookup of a target protein. Thus, each record " " 
he .eact.vity database win contain fields for the reaction XD 
15 the target protein ID, and the result of ^^, • 

rate constant) . °^ ^""^ reaction (e.g., a 

There is a many- to-one rel 
database an. the .eactivit; ttTbte' alT^" 

bet„ee„ the .act... .Hebara: t^rel:: 

one :^^:.i^z::t'zziz:t 

relationship between the drugs and fhe "^""-'"-"^ 

relationship is preferably nolalired bv 

25 database of individual ^ruUro7el \l °' """'^"'^ 

^ protean interactions. 
-Lt IS also desirable* ^r^ 

each .ecord identifies a di eren t'lro^' ^" 

have .ore than one activity, «d 1^:! T/^^' ^"^""^"^ "'"^ 

relevant drugs for each different spectrum of 

" »ui :nrr:di;%ra7rrtrrr 

value, potency, .ch . Lld^^:— II T :rtl:no\^^^^^^ 

It will be appreciate by those stin.,, ■ ^ 
35 programming art that there are Zv 

information of interest. ThereforrL""' °" ^""'""^ 
not limlt^ri . the present invention is 

not limited to any particular database design. 

Use of rh<. Po^^^jyif^ n=..^K.^,p 
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^. ^ ^ generation as hereafter described As 

discussed above, the initial u "oea. As 

P.O.... -;r^Lrr\i::i-r™.::: 

5 biology and expression crofn^c r^ unaerstood 
these proteins are in the /atllase -"^^ °^ 

be determined for proteins tlat h ' ''^'''T"' ^^^"^^^^ 

combinatorial librLies wL !k '""'""^^ "^^^^^^ 

nn^ I. ^^^^aries, whose three-dimensional structures are 

not known, and/or whose biological crofsio v, ^"res are 

10 determined. it will he . 

xc will be appreciated that there ic- nr, -f- ^ 

The more database proteins th^^^ ■ interest. 

d.s„.p..... ,L s^ua. T:;r. "^"^'"^ ■ 

T c ^j-iiiixar tnose descriptors ar-pa 

15 more useful the* my-Sr^v- u ^ , ^-^-i-tJuuirs are, the 

proteins is ef^ I br^^ " 

number Of proteins in the da^ K 

oteins m the database increases. Preferablv th^ 
database will provide data on at least 50 "^^J^^^V' the 

least 200 o-T. ' """^^ preferably at 

least 200, still more preferably at least lOon ^ 

20 preferably every known protein Anot^h / 

Of the proteins in the database m T"" "'^^"^^^ 

the more likely it is that at ; " Proteins, 

-e a reasonably siL^ rlLtrvir ^ ^^^^^^ ^^^^^^ 

consta:tVT:^::trrn^rnT ^f; ^---'^ - - - 

constant differences) for all 
characterizing reactions to which th.f . ■ 
subjected. The reactivi,^ ^ Protem has been 

and the si^iiariry betwee^ Z'T^' '""^^"^ 
proteins in the databa e win be 7."°"" '"^ 
>0 section on "Descriptors.. ed as described in the 

on thrnrrL°;:;t:rsn7trr^ " 

nu^er of reactions^o ::::h TheTrottras s^ectT ^d tt 

number of Drotein . ^^jectea, ana the 

S .ore data Ztll: HTlLlZT. ^^^^^ ""^ 

Preferably, there are ITT^I T °' ='^aracteri.ation. 

still .ore preferably at least 10 .0 ! ^^^^^"^^"^ « 

.appoints .target Prcteir/sl "eVrrcrn™^ .^""w^C; 
less easily defined, the greater th. -• • 

greater the aiversity in the 
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Relation to databases with infr.i-m=,- • 
as A, „iai predict What types of cheT l " """^ 
Ul^ely bind to the untaown^a'e t 1' ""^ ""^^ 

similar in K „iu bind IIT ^"""'"^ ""^^ ^^^"^ 

b. * "^^^ ''"■"'^ ^I'lilar compounds. This procedure win 

- a .no» inh^itor wTuTit" H-:'" 

Relation ^ :Lnhibit similar targets 

Relation to an expression database such as T win !" 
phar.acolo.ical information on the un Jl target "^^"^ 



. Ap tan-.er-H.sPn n.^n^jp f,,, Prot ein^^ndin^.^^ 

An aptamer-based descriptor 7^ 
in terms o. the eptamers whfch reco nt/e ^^""^ir ' '""T 
could serve as aptemers, the preferred . """""^ 
15 acids. preferred aptamers are nucleic 

it isfrLt y'''"^' ^ numerical value. Rather 

it is a list of sequences (and, preferablv "cner, 

and contact points) for each of th, V structures 
binding a particular protein in this ' i<^-ti£ied as 

20 how nucleic acid aot,! section, we will describe 

how the Sim lari r; oT7he":;t:r""" — eri.ed, and 
different proteins^ay rca^d ^ "^"^^"^^ 

In essence, a single-stranded oligonucleoti rt. I- 
screened to Identify aptamers which 11^ . ! '"^ 
2S desired affinity. This protein may be a rlrenc" ' 
known drug antagonists, or the tara,V 

antagonists are to be identified 'the 'T " ^"^-^ 
amplified and sequenced. ^"""^'^ ^P'^'""^ «' 

The aptamers serve to charact^r-i 

30 those Oligonucleotides which cardr '° 

protein will bind to it One '° °* 

•impressions., of the p tein surfTce^^'s^'"' ^^"""^ 
be expected to bind the proteTn a" ' a "^^ 
overlapping, or otherwise occluding ,! f ^^^'^^""^^"^ 

- protein. ,Such aptamers may, iTdesirer b"" 

screening the aptamers for antagonisr ^ • ^^""""^"^ 
bind at Sites distal to the Z7tZlll "'"-^ 

the same site, others, ^rcol^ b"" " 

xues. All contribute to 
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a "picture" of the protein. 

Preferably, the contact sites (the bases within the 
aptan,er) through which the apta„,er contacts the protein are 
.dentified. . preferred „eans for such identification is a 
5 footpr.„txng reaction where che.ical modification of the nucleic 
acid X3 performed with and without the bound protein,- sites 
Where bound protein bloc.s chemical modification are contact 

act e /""'"""""^ "-1"= -i-^s in this manner can be 
achieved using enzymes such as DBAse or chemical reagents such 
- -PPe-Pl^enanthroline .Papavassiliou. A.G. Biochem 
305. 34S-35,, , Pe,EBTA,= (Pogozelski, et al . . J. Am. rH,„ 1 

Ts7,> • ""3- 

" the seco'r'""' ^"ermined, 
the secondary structure of at least the contacting bases of the 
Oligonucleotide is analyzed Tho o= j 

^^^"^^ secondary structures of che 

aptamers can be predicted nc-inrr 

._ ^ ^ . ^ eaicced using the approach of Zichi et al. 

(J. P. Davis, N. Janjic, D. Pribnow D A Zi^h." m n • 
20 Res iQQc; ' 2ichi, Nucleic Acids 

Kes. 1995, 23, 4471-4479) wherf» a t-t.,^ ^• 

' wnere a two-dimensional color grid is 
used to indicate sites of potential • • 

, , . Pocential base pairing, revealing the 

underlying secondary structure. More tradition.! i • 

_ , More craaitional nucleic acid 

folding approaches are those of Zuker, which can be found in « 

2S T Ti Turner, and 

2S «. zuker, Proc. Natl. Acad. Sci USA SS. 7706-7710. The 

contact Sites, determined usina the • 
d.=o^(K=/i tootprinting reactions 

described above, are mapped onto the predicted secondary 
structure, and the functionality sequence is the list ot 
nucleotides that contact the protein, read from the 5- to 3 

elert::,.! ™^ "^^ ^"^"^-^ 

experimentally, see Tinoco. Jr.. j. phys. chem.. 100 = 13311-22 

not JT'.T -P""- preferably identifies 

not only the overall sequence of the aptamer. but also the 
S contact Site. ^ the cases of an oligonucleotide, the bases of 
the contact site are preferably described not only by 
identifying the base itself, but also indicating whether il the 
secondary structure of the aptamer -i ^ -i o • ^ 
if fi« T aptamer it is paired to any base, and 

xf so what, in a convenient notation to indicate the secondary 



wo 99/06839 



PCT/US98/15943 



51 



structure at each contact site, two-letter codes are used for 
each nucleotide. For single-stranded nucleotides, the contact 
nucleotides are followed by the soall letter ..o", so the 
contacts are Ao, To, Go, and Co for a DNA aptamer. For double- 
S stranded nucleotides, the contact sites are followed by a s.all 
letter representing the nucleotide on the opposite strand. For 

Cleavage, that s.te is listed as ..At-. Both base pairs and 
^is^atches are represented this way, so the entire list of 
10 double-stranded codes is (for DNA, : At, Aa. Ag, Ac, Gc. Gg, Gt 
Ga, Cg, Cc, ct, ca, Ta, Tg Tt t,- ' '^3' 

/ J. a, xg, ij^^ j'^ ^ When rnmhTTnor? r.,^^-V 

single -stranded codes, there 

elements. " '° functionality 

15 '° comparing the footprinted sequence in a 

15 1 near array would be to develop a two-dimensional projection 
Of the secondary structure of the aptamer. For example, up": : 
the aptamer. at the right is selected for a given target and the 
Sites .ndrcated with an arrow are determined to be protected b^ 
the target v.a Pt-pop f ootprinting . The aptamer can then ml 
>0 mapped onto a two-dimensional grid where the contact sites are 

TeV: "^^"^'"'"^ "Placeholder, sites are code" 

as either Nn for a base pair or mismatch site or No for a 
single-stranded site ,see Figure 5, . There is no need t^ 
differentiate the base pair and mismatch placeholder sites 

The"::' T ^= ^^-^^y in the contact site codes 

The two-dimensional grid can now be analyzed for similarity to 
other two-dimensional reoresenrari™. k ^ ity to 

approaches, such as those used fTr dlt '"^'^"^"^""^"^ 

Similarity (as in Patterson et al 7 Z Z"'"^ "~ 

^ J- Med. Chem,, 1996). 

dev., """" ""^ secondary structure are used to 

TZ °ll:~r "^"""-"-"V — ce. (epitope,, which 
represents both the primary sequence and the secondary structure 
Of the nucleic acid moiety which contacts the target protein 
These functionality seouences av. • procein. 

database that is uLd tTZTl . ' 
unicnown targets " ^^^''^ /"-"-allty sequences for 

argets. Targets with homologous functionality 
sequences bind small molecules of a similar nature. Thus if 
a connection is drawn between an unkno™, target and a target for 
Which small molecule binders are .nown, screening of the LnlZ 
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target can be restricted initially t= libraries of molecules 
s...lar CO those that bind the toown target. PunctroLlu; 
se^ences can be determined on small quantities of targets 1 

combinatorial peptid!: and dete^a^^^ 7^:!:::.::^^ 
mutagenesis can also be envisioned ^ 
Sequence identity among aptamer contact sites may be 
determined using an adaptation Of BLAST fr™ ^>, m ! 
10 for Biotechnology information. T^^^r Z T ^ ^''^" 
search .ool, algorithm is .escribLT^r^.r ^-^^r 

.sec:rr^-str:ct::e.:or".rtT"^ ..nctionauty 
- oan be adapted from art in J!:,:^.:!^' " ~ 

used '"if the' aT'"!, T^"''"" ^" ^"^^""'^ i= 

used. If the aligned functionality elements are .he same the 

score for that pair of elements is one if the. JT' 
the scope is zero. The individual scores , 
20 by the total number of elements '"^ '^'''""'^ 

A second approach would be to w*>-i^v,^ 
alignments least ItJcely to occur bv chr , ^'"^ 
re<^ire tabulating predicted second. 
a large number of ' »A seZnlr Thir""^ 
=5 randomly generating DKA ':e,: :ces of tte 1 

composition to be used in the , k ^"^ "^^^ 

secondary structure of each se "' ^"^^=""9 '"e 

converting the str^n; of bases Tntri « 

elements, and ,4, calculating the orLbn"^ "functionality 
iO each alignment. Of course this „. °' °' 

to protein contact sites / aTrb:::frc~jr ^^"^''^ 

:i:::t:-h:n:nrLrrma?i;:^;- - - - 

Ga Ga AO ^ ormacive, than a mismatch (Gt, 

^g, i>a, Ag, etc.) or a sincrle-st-rflr.r?«,^ « -■ . 

S CO, . These elements could therefore k 

to alio, the elements that hi^the .^ri ^^''^°^^'^^^'y 
considered most heavily i„ deterTinT " >^ 

possible weighting, see Ex f similarity. For one 

While nucleic acids are preferred, because they can be 
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amplified, peptides may also be used tr. rr^r. 

J usea to generate aptamer-bacs^n 

descr.pto.s. P.ef.raMy, these peptide, are 5 xo a a In 
length. The peptide library is screened (or binding to' the 
^ery protein and the peptide apta^ers are compared to those of 

LLo "^"^"^ ^= co^parl th. 

ammo acrd sequences. Any scoring „atri. conventional in the" 

pa L'-'TheT"^;."^' " ^^^--^ 

10 Lan^ardiTeV 

determining apta.er similarity on the basis of 

o"rs;;;;r;r;;r^tr^rr tt'" 

.Cid aptamers, a -^.n:arpr ;r:e ::redT l^rLt 
15 bit represents the presence or ah««r, ^ 

str^r'^„r-» u ^^^"^^ °r absence of a particular secondary 

structure, such as an unbonded region, an unbonded region ofT 
Particular length (e.g., ..^ bases, 7-20 bases, >20 bases) an 
^^r.or loop, an interior loop of a particular length e g 
2-6 bases, 7-20 bases, >20 bases) k„t . 

.0 Of a particular length ,e.g. i base 2 3b ' 

u ^ -y-/ J. Dase, 2-3 bases, 4-7 bases A.-^n 

bases >.„ bases, , a hairpin loop of any length and type 
harrp.n loop closed by G.c, a hairpin loop oLed by Tu a 
partrcular type of hairpin loop of a particular length g 
3 bases, 4-5 bases, s.y bases, bases, 10-30 base. ^3^ 

25 bases) , a run of paired basec, „-F = oases, >30 

base comnoe • . • particular length, an overall 

oase composition in a particular- r-^rsr.^ i 

See Tin™ . . rt^icuiar range {e.g., 40-60% GC) , etc. 

see Tmoco, et al . , Nature New Biol., 246-40-41 fl97.\ h 
Tinoco, et al.. Nature, 230:362-7 (1971, / ^ ^""^ 

to estimating RNA structure wh ' h ^^^y -PP-ach 

30 .ngerprinting. ^ ^TZ^tZ ^t:! ^ 
a bit representing the presence 

^ -, a = presence or absence of the extracj^aKio 

tetraloop 5' -GGAC(auCG)GUCC 3' or nn. ^ u ^^^^^^^^able 
such as a pseudoknot Particull ' structure 

structures often implTcate, ^^-^ 
35 acid:protein binding^ ' ^^"^^^ ^^^^^^^^ 

suppose that m aptamers bind the query protein and n 
aptamers binding a reference protein ZJ^T t 
similarity of the taraet . The aptamer-based 

on ^h K reference protein may be calculated 

on the basis of anv or an ..». '-axcuj.acea 
any or all of the m x n possible comparisons. 
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Many approaches are possible incluHir,^ w 

f xcxe, including, but not limited to 
1. comparing the highest aff^n-i^„ 
apeamer with the highest affinitv , 

aptamer. affinity references protein bindinj 

3 • "T^ T " =-P«i-ns and average the results . 
3. calculating a weighted average of the ™ x n 

- ative affinities ^y he assigned on the hasis of L ,1 , 
«.rinities, or their loganthMs. The same is don/'wuT't": 

reference -binding aptamers Th=.- . 

^ ^ ^'■'"ilarity score for each nf 

relative affinities. reference-binding 

4 . Developing a consensus sequence for the m query-binding 
aptamers and a consensus sequence for th- „ 7 oindmg 
aptamers Th» ,.. quence tor the j. reference binding 

r „ak Z sequences are then examined. 

= - Maying all ,^ x comparisons and select the highest 

Maying all ,^ x n) comparisons and select the lowest 

2S score!' " °' comparisons and select the median 

After functionality sequences h^^rc. k 
suffir.i«r,i- w sequences have been determined for a 

sufficient number of known targets ant-;,mo^. 
new targets of unknown structure L 1'" !! r .7 
are footprinted, e.g., using Pt, ,p!p^'"'^^^"^. 

ligart:: :::: '::::^:r:zrz:zr:: -r- 

the data base for known targets ini^.i 

molecule libraries is then directed Zlrll 

^5 Similar si.e and functionality ^ those ..'"T '''' '^"^ 

targets that exhibit high funct Witv ""^"^ 
unknown target of i,,/,,^, homology to the 

In one embodiment, this information is entered into a set 
of relationally linked databases Mo v, 

aacaoases. We have already discussed 



20 5 
score . 
6 

score . 
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15 



protein, drug, and protein-dr-nrr ■; r,^ 

a. ymein drug interaction databases Clearlv 
we may also provide a da^^,K=,c^ i • • i-J-early, 
. -, database listing all aptamers which bind 

at least one protein in the database. The apta.ers databiL Ls 
a .any-to-.any relationship with the protein database Hence 
to normalize this relationship, we would like an apta.er-plte.; 
interaction database, in which ^^r-h >- . • ^''^"'^^ protein 
ID field (a lint ^ includes an aptamer 

J.U iieia (a link to an aptamer da«-aK=«,»x 

/- unir ^« database) and a protein ID field 

(a link to a protein database) tv,^ 

identify the contact site Se' ^" ' optionally 

in eithlr ^h I Secondary structure may be indicated 

xn either the aptamer record (if invariant) or in the aptamer 
protein interaction record (±f ^-Ff aptamer- 

un record (if affected by the Drr,^^•ir, 

» — -.»iv*j.iiy y . 

Other PT•r>^o in Df>sf^^^ p ^,.^„ 

descriptors for the .^^^^l '"^ '"=1' '■'^^"^"dditional 

include the followi:," descriptors 

Amino Acid Composition 
Structure 

P^^dicted Overall Alpha Helicity 
Predicted Number of Alnha h«i< 

Di«,.if.^ n . Helices or Beta Strands 

Disulfide Bond Topology 

Predicted Surface A-rca ^^ tr i 

a.a(_e Area-to-Volume Ratio 

Sequence 

Similarity Scor-e^c, ^ 

Proteins ''"'-"^^ ^^'^^^^^ 

Gross Physical Characteristics 

Molecular Weight 

Isoelectric Point 

Thermostability 
^° Overall Hydrophobicity 

Overall Aromaticity 

CHO content 
Biological Activity (various) 



35 



£2inEound_Library 

.MUt. to ..«e e .ioio,ic:: atrCi:^;^ p^^rt 
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fi:»u£xer to the same dearee nno 
Phao.aceutic.1 disaavantages of peptides °' 

In designing a compound librarv- fi. ■ u , , 
"ind the methods of molecular ■ " ""^ ^" 

Obtain new drugs Three h T °" typically used to 

- -entifled. ^^wir rrea:?°""""°° 

identify its component phar™aco;:lic ::ietres " 
whir'h . t^xiozrxc moieties; cornunctinn ir, 

the si;: ;:°i;;r„rr"""^" -iaties.^;^i;r;;;n; 

noncovalently, to f or„ a L d" . 1^:"''' "^"^"'^""^ °^ 
5 moiety is replaced bv ano^h v, ^itsmian. In which one 

but Which is not inV/fect Its "^'.'^ or different, 

use Of the terms "dis/rctio c^""" " 

is intended only to cLn stLcL":::"";^' -deration, 

end product to the oriai„=, , !. ^'"^'^''"''^^ relationship of the 

' actually synthes^eed ^tbU:: u T 

the same. " ^= P°="We that the two are 

Of neo?tZt:7i.iiriT"°r^ "^"^"^"^ 
subse^ent"c:^urtrriu::::rb^V"-"^ 

(1956) and ambenonium (1956, . ^^^"^^rated by demecarium 

Alterations may modify the c.-,-^^ -, . 
distribution Of an origi„a/m ty AlteraTio 
Closing or opening, formation of lower or Ih 
introduction or saturation of doubTe bands ' ! """^^^S""- 
optically active centers inrro,, 7 introduction of 

Of bulky groups isoster!: k - "placement 

in the Posit'io; or or^It -o:"":"^'^ substitution, changes 

-e-^ostat. or — ^ — 

and/ortectrn TC^^Z::^ ^ = 

-elude -KH.. ™ ! r™^r 



wo 99/06839 



PCT/US98/15943 



57 



CO.^ -P, -ex, -Br, -OH, -OH, -SH, -SH, -CH-CH. -CH-C.,, and 



25 



30 



35 



The substituents may also include those which increase or 
decrease electronic densitv ir, ^ • increase or 

5 f*p^ "'^''^^^y conjugated systems. The former 

5 (+R) groups include -CH,, -CR -p n r, ^"^er 

^' ^'^s' / -CI, -Br. -I -ow oc 
-OCOK -SH, -SR, -KH. .«H., a„a T.^ TR- glps 

.ncluds -»0., -CN, -CHC, -COH, -COOK, -COO., -C0,». -s ™ 

Synthetically speakina ^hf»m«^,•<:• 
„ , «=»«.-Lng, tile modifications may be achieved 

0 by a variety of unit processes -ir,^! ^- ^ °e acnieved 

Processes, including nucleophilic and 
electrophilic substitution ^o^, -teopnixic and 

. stitution, reduction and oxidation, add^^^^n 
exxiiiinacion, double bond cleavarro ^ . . 

Cleavage, and cyclization. 

a famiTv'of — tructing a library, a compound, or 

_ a family of compounds, having one or more pharmacological 
. activities (which need not be related to th. ^ rmacological 
= -^^ ^ icxacea to the known or susnected 

a=t.v.t..s o£ the target protein, , „ay be disjoined into two or 
.ore renown or potential phar^acophoric moieties. Analogues of 
each Of these moieties he identified, and .i«ares rthese 
analogues reacted so as to rea<5«*.mKi tnese 
similaritv to M.. • ^^^^^^"^^1^ compounds which have some 

thL !l, t '""^ compound, it is not necessary 

that all members of the librar-v r,^., . =s,s,ctjry 

all „f -Library possess moieties analogous to 

all of the moieties of the lead compound 

'^-"^n Of a library may be illustrated by the example 
of the benzodiazepines. Several ben:.r,H^ Y t:ne example 

u-i ^- ^"^°^^32epine drugs, includincr 

chlordiazepoxide, diazepam and oxazenam h., k ^'''^^"'^^"^ 
anxietv rir-„«o o • "°°^^2epam, have been used on anti- 

b'loaLr ^^'^^"^'^^^^ °^ benzodiazepines have widespread 
biological activities; derivatives have been reported to act not 
only as anxiolytics, but also 

=holecysto.inin ,CCK, receptor slt^e Tor T C'""": 
receptor, platelet activating factor, Id HIV LnsaoT'atr:: 
antagonists, and GPIlblla rev<.r.=« . tivacor Tat 

famesyltransferase inhibitors ^ "-scriptase and ra. 

The benzodiazepine structure h^c= ^- . . 

^ • V -"-u^^cure nas been disjoined into a 2- 

ammobenzophenone, an amino acid and an ai v„i . ■ 
Bun-ir, 1 T^ an alkylating agent. See 

Bunm, et al., Proc. Nat. Acad. Sci. USA, 9i:4708 Ussl . Since 
only a few 2-aminobenzophenone der-Hv»^■ - since 

availshio -, P^enone derivatives are commercially 

available, it was later disioined o • 

acid chlorid*. .r, s:'°ined into 2-aminoarylstannane, an 

cicia cnioride, an ammo acid and ar, t ^. 

et al M^^h p-r, ^"'^ ^^^^yl at mg agent . Bunin, 

ec ai., Meth. Enzymol . , 267:448 nqQc\ 

/.4«B (1996). The arylstannane may 
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be considered the core structure upon which the other .oieties 
ere substituted, or all four .ay be considered equals which are 
conjoined to make each library member. 

A basic library synthesis plan and member structure is 

08^:0^71''""" ' °' 
S : t' -reference in its entirety. The acid 

Chloride building block introduces variability at the site 
the all by the amino acid, and the R. site by 

the alkylating agent. The site is inherent in the 
XO arylstannane. Bunin, et al. generated a 1, 4 -benzodiazepine 
library of 11,200 different derivatives prepared fro„ ,n IT. 
cnxorides, 35 amino acids, and is 

... ' alkylating agents. (No 

diversity was introduced at R*- th^c 

, , ■ ■ ^ ' this group was used to couple 

the molecule to a solid phase., According to the Available 
15 Chemicals Directory ,HI=L Information Systems, San Leandro CA, 

aTw m:"' T"'"''' " -i- acids and":; 

alkylating agents .:ere available for purchase (and more, of 
course could be synthesized, . The particular moieties :sed 
20 tZ\ " structural dispersion, while limiting 

20 ^"• numbers to those conveniently synthesized in the wells of 
a mxcrotiter plate, m choosing between structurally similar 
compounds, preference was given to hh. i . i 
compound. ' ° substituted 

2S groups. Among the aliphatic groups, both acyclic and cyclic 

etsible to / T ^""^^ ""^ " "-1"-- 

feasible to introduce a branched aliphatic, . The aromatic 

30 f """^P^- --9S, fused or not 

30 substituted or not, and With heteroatoms or not. The seconds^ 

While not used, spacer moieties, such as -0-, -s- -00- « 
«H-. and could have been incorporated ' °° ' " 

)5 benzodiazepine as a core atmr-t-,,,. 

b.„,„,fi. structure, one may instead use a 1, 4- 

benzodia2epine-2, s-dione structure. 

As noted by Biinin et al ^« j 

y unm et ai . , it is advantageous, although not 

necessary, to use a linkage strategy which leaves no trace o 
the Unking functionality, as this permits construction of a 



wo 99/06839 



PCT/US98/1S943 



59 

more diverse library. 

other combinatorial nonoligo.eric compound libraries known 
or suggested in the art have been based 
mercaptoacylated pyrrolidines n>, ! carbamates, 

5 acylamino ethers (made fro; 

maae from ammo alcohols ^»r-omoi.^ « u 

DeWitt, et al., proc. Bat. Acad <!^f ,„o,, 
10 <1.,3, describes the simultaneous but separate 

discrete hydantoins and 40 discrete be„ H ^^^^^^^'^ « 

-t .their synthesis on a soUd surpo ' , ""^^ 
tube,, in an array format/ s ^ed ::'!' 
sinn.ltaneous synthesis techni^rTe , inT. ^r^""""'' 
15 pin). The hydantoins were svnthJi ^l" ' °" ^ 

deprotectin, and then treat^ each 0/^ "■^'""'""^"''^'^ 
With each Of eight isocyanafes l b 

synthesized by treating each of fiv! d — 
resins with each of eight 2 a J k ""^'^^^^-'^ 
2° Chen et al t t benzophenone ieines. 

described'thrp:;ara::onnf rtuotT.U";"";" 
library of formate esters a „!, t^"'' combinatorial 

preparation was -splif into thr , ^"-''>^= 
one Of three different ^Ide reagentrr; rV''^' 
25 were combined, and then divided Cto th " ^"""""^ 

Of Which was reacted with a different Mictt 71 
identity was found to be determinabT """P"""" 
.as chromatography.mass s^^J r.^'- 

30 the corrarrri^s'^rs^s '"zr- - 

»etathiazanones . T^ese librL 'hiazolldinones and 

amines, carbonyl cc:;o^d3 T °* 

conditions. '""^^ cyclization 

Ellman, USP 5,545,568 (lassl ■■. 
35 synthesis of benzodiazepines "1! ^ / combinatorial 

.imetics, and .lycerol-basL ::;pors l"'' k,?"^"'"" 

5,288,514. -npounds. See also Ellman, USP 

Summerton, usp 5,506,337 (iggg) ^ 
preoarincT ^ v tiS96) discloses methods of 

preparing a combinatorial libraT-v ^ 

xitorary formed predominantly of 
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morpholino subunit structures. 

in « r"'"""," ii'-ries are reviewed generally 

: V "■"'-"^ one or .ore 

™o.et.es Of the following types .ay be incorporated into 

compounds of the library, as many druqs fall in^o 

^ n . arugs tall into one or more 

of the following categories: 

acetals 



acids 
alcohols 
10 amides 
amidines 
amines 
amino acids 
amino alcohols 
15 amino ethers 
amino ketenes 
ammonium compounds 
azo compounds 
enols 
20 esters 
ethers 
glycosides 
guanidines 

halogenated compounds 
25 hydrocarbons 

ketones 

lactams 

lactones 

mustards 
0 nitro compounds 
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nitroso compounds 
organo minerals 
phenones 
quinones 
5 semicarbazones 
stilbenes 
sulfonamides 
sulf ones 
thiols 
10 thioamides 
thioureas 
ureas 
ureides 
urethans 

15 Without attempting to exhaustively recite all 

pharmacological classes of drugs, or all drug structures, one 
or more compounds of the chemical structures listed below have 
been found to exhibit the indicated pharmacological activity, 
and these structures, or derivatives, may be used as design 
elements in screening for further compounds of the same or 
different activity. (In some cases, one or more lead drugs of 
the class are indicated.) 
hypnotics 

higher alcohols (clomethiazole) 
25 aldehydes (chloral hydrate) 

carbamates (meprobamate ) 
acyclic ureides ( acetyl carbromal) 
barbiturates (barbital) 
bensodia^epine (diasepsm) 



20 



3D anticonvulsants 

barbiturates (phenobarbital) 
hydcintoins (phenytoin) 
oxazolidinediones (trimethadione) 
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succinimides (phensuximide ) 
acylureides (phenacemides) 

narcotic analgesics 
morphines 

5 phenylpiperidines (meperidine) 

diphenylpropylamines (methadone) 
phenothiazihes (methotrimeprazine) 

analgesics , antipyretics , antirheumatics 
salicylates (acetylsalicylic acid) 
0 p-aminophenol (acetaminophen) 

5 -pyrazolone (dipyrone) 

3 , 5-pyrazGlidinedione (phenylbutazone) 
arylacetic acid (indomethacin) 

adrenocortical steroids (cortisone, dexamethasone, 
prednisone , triamcilone) 
athranilic acids 

neuroleptics 

phenothiazine (chlorpromazine) 
thioxanthene ( chlorprothixene ) 
reserpine 

but yrophenone ( ha 1 opendo 1 ) 

anxiolytics 

propandiol carbamates (meprobamate ) 

benzodiazepines (chlordiazepoxide , diazepam, oxazepam) 

antidipressants 

tricyclics ( imipramine ) 

muscle/relaxants 

propanediols and carbamates (mephenesin) 

CNS stimulants 

xanthines (caffeine, theophylline) 
phenylalkylamines (amphetamine) 

(Fenecylline is a conjunction of theophylline and 



wo 1^/06839 



PCT/US98/15943 



10 



63 

amphetamine) 

oxa 2o 1 i dinone s ( pemo line) 
cholinergics 

choline esters (acetylcholine) 
N , N"dimethylcarbamates 

adrenergics 

aromatic amines (epinephrine, isoproterenol, 

phenylephrine) 
alicyclic amines (cyclopentamine) 
aliphatic amines (methylhexaneamine) 
imidazolines (naphazoline) 



ant i - adrenergi c s 

indolethylamine alkaloids (dihydroergotamine) 

imidazoles (toiazoline) 
15 benzodioxans (piperoxan) 

beta-haloalkylamines (phenoxyLenzamine) 
dibenzazepines (azapetine) 
hydrazinophthalazines (hydralazine) 

antihistamines 
20 ethanolamines (diphenhydramine) 

ethylenediamines (tripelennomine) 
alkylamines (chlorpheniramine) 
piperazines (cyclizine) 
phenothiazines (promethazine) 

25 local anesthetics 
benzoic acid 

esters (procaine, isobucaine, cyclomethycaine) 
basic amides (dibucaine) 

anilides, toluidides, 2, 6-xylidides (lidocaine) 
tertiary amides (oxetacaine) 

vasodilators 

polyol nitrates (nitroglycerin) 



30 



diuretics 
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xanthines 

thiazides (chlorothiazide) 
sulfonamides (chlorthalidone) 

antihelmintics 
5 cyanine dyes 

antimalarials 

4 -aminoguinolines 
8 - aminoquinol ines 
pyrimidines 
10 biguanides 
acridines 
dihydrotriazines 
sulfonamides 
sulf ones 

15 antibacterials 

antibiotics 
penicillins 
cephalosporins 

octahydronapthacenes (tetracycline) 
20 sulfonamides 

nitrofurans 

cyclic amines 

naphthyridines 

xylenols 

5 antitumor 

alkylating agents 
nitrogen mustards 
aziridines 

methanesulfonate esters 
3 epoxides 

amino acid antagonists 
folic acid antagonists 
pyrimidine antagonists 
purine antagonists 
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antiviral 

adamantanes 
nucleosides 

thiosemicarbazones 
5 inosines 

amidines and guanidines 
isoquinolines 
benz imidazoles 
piperazines 

10 For pharmacological classes, see, e.g., Goth, Medical 

Pharmacology. Principles .nd Ponnnr r- (CV. Mosbv Co. • 

cZ!;'"?iT" Burckhaiter, E ^tiais'of m..;:.::; 

S^jm^. (John Wiley.. Sons, Inc.:^). Por 3y^thetic 

methods see, e.g.. Warren, Org^svnthe.i . .v, 

^ERToach (John Wiley . Sons, Ltd.: 1982); Fuson, 
prgan.c Comp ounds (John Wiley . Sons : 1966); Payne and Payne 
HOW to do an Orq anl. c. ^t lfesis (Allyn and Bacon, Inc.: 1969,: 
f P^otectiv. oronn. n.^ .^. - .^,„„ . 

20 IITZ Tl ' °' -bstituents, see e.g. , Hansch 

Chemistry and Rlology (john Wiley & Sons: 1979) 

The library is preferably synthesized so that the 
individual members remain identifiable so that, if a member is 
Shown to be active, it is not necessary to analyze it. Several 
methods Of Identification have been proposed, including- 

(1) encoding, i.e., the attachment to each member of an 
Identifier moiety which is more readily identified 
than the member proper. This has the disadvantage 
that the tag may itself influence the activity of the 
conjugate. 

(2) spatial addressing, e.g., each member is synthesized 
only at a particular coordinate on or in a matrix, or 
xn a particular chamber. This might be, for example, 
the location of a particular pin, or a particular 
well on a microtiter plate, or inside a "tea bag" 

The present invention is not limited to any particular form of 
Identification . 

However, ie is possible to si„,ply characterize those 
»e™bers of the library which are found to be active, based on 
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bloct""'""'" -P-troscopic indicia of the various building 



10 



20 



Sol.d phase synthesis permits greater control over which 
derivatives are formed. However, the solid phase could 
interfere with activity. To overcome this problem, some or all 
of the molecules of each member could be liberated, after 
synthesis but before screening. 

Lead Ge neration 

As a result of querying the database with descriptor data 

for a query protein, the user receives a li.r .... 

proteins. For each reference protein a similarity s;;;;rand 
a list of Known antagonists (or other modulators) is given 
These are the "drug leads" . The antagonists are weighted by the 

15 If available, 

15 and desired, they may also be weighted by their potency against 
their corresponding reference protein, and/or by other 
Physicochemical characteristics of interest, eg 
lipophilicity. 

This invention contemplates the construction of a composite 
combinatorial compound library which is biased in favor of 
compounds (both scaffoldings and substituents) which are 
structurally similar to the drug leads 

Preferably, in view of the wide variety of "drug-type" 
compound combinatorial libraries already available, the 

1~:.""^^^ " °" ^'-^ — - — - -mple 

to .h'T ''"^ '"^ " compared, using ^^^uct^^ degcriEt^ 
candidate simple combinatorial library. The structural 

TslTa I r'^' ^^-^-^ those 

listed m Patterson, et al . (i99si vi . 

„ . ' a±. ixssej, Klebe and Abraham (1993), 

Cummins, et al (I99ci « » ^/ , 

m.^h.„,.^ ^"^"^ Conventional 

mathematical methods mav be i,o=.,^ *. 

descriptors. '° °^ ""^^^^ 

n„,r' fingerprint method described in Matter, et al. 
(1997) is Of particular interest, m essence, the compound is 

fral't °^ ^"^^^^^ °' particular molecular 

fragments, the results being encoded in a binary format. in 
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project r ' " ^^^'^ ^-S™-' 

of U..ted sxze U.,., fewer bits than the total number of 
B una,ae fragments in the compounds of the database, In 
addxt.on. the presence of 60 specific functional groups, rings 
or atoms was encoded in so of the total 968 bits. Details aL 
given rn UNITV che.ical Information Software, version Vs 
Reference Guide, pp. 45-S8, Tripos Inc.. X6,9 Hanley Rd., St 
10 Louis, MO 63144. 

A similar approach is described in Martyr, .^ . 
Chem., 38:1431-6 (1995) A "Davllah^ ^- ■ " ' 

, . ^ "^^^ ^^y^^9ht fingerprint" routine was 

bonds 1 . ' substructures up to seven 

bonds long and set one bit in a 2048-bit string for each 
IS fragment found. A "hashing., algonthm randomly assigned each 
fragment to one of the possible bits. 

identif ' hTT " " °' n^olecular frameworks 

Identified by Bemis and MurOco, J. „ed. Chem. , 39.-2887-93 (1996) 

20 fL incorporated into binary descriptors of a 2^ 

20 fingerprint -type. ci 

Of or°L"r'' descriptors may be used instead 

of or in addition to 2D fingerprints. 

The candidate librarv i c 

^ " assigned a weight which is a 
function of (i) the druo wnicn is a 

2S Similarity Of its ref™ pr.tertT tlHe'"^"'"""^ 

optionally, the potency or olher ^l^^l^^:^ 

z 'artre's^aff^rr ^""-"^ ---- " 

a a cne scaffold. These weights then determine the 

predominance of that candidate librarv in ^v,« ^^^^^ the 

,0 library. .hus, if .^en.odia.epinlTlinir ^ T^T^" 
as oig as the carbamate library. 

Each drug lead may also be used to evaluate possible 

s suDstituent is scored, on the hacio ^ 

. ^'^^^^ °^ structural descriptors, 

candidate substituent in the combinatorial reaction mix may then 
be governed by its similarity score t • • ^ "^"en 

substit»^r,ro V, score. Low-scoring candidate 

suDstituents may be omitteH oth--!>-«»i 

y omitted entirely, or merely reduced in 
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concentration . 



conversely, the mix may be limited to high 
scoring suhstituents , or their concentrations may merely L 
increased. The concentration changes need not be strictly 
proportional to the scores. 

5 If a reference protein is very similar to the query 

protein, it should have a strong influence on the librar^ 
composition, and, i, its similarity is only modest, i7s 

10 that^ r ' "''"^ ^"-^'^ ="-91/ f-=I 

10 tha library, and one which only weaWy resembles it should 
imply a more modest enrichment. 

One possible mathematical approach is shown below 

Let 

W,, = original absolute weight of library type L in 
^ composite library [0..i] 

S, = ^i-il-rity of query protein to reference protein 

S. = Similarity of reference protein drug d to library 
type L [0. .1] 

- quality of drug d as drug lead in general [0..1J 

lead d of reference protein p is 

"•l = (W.. . (1-s,)) . (w„ • s. . Q, where 

25 If qd.l, then 

weiahtsV'!^ -"---rte^ « ne. absolute 
weights by dividing each by the sum of for all L 

30 libra^*"^" " contemplating synthesis of a composite 
tie ab? r ^i«=Pines and ,b) carbamates. In 

Ubrart 5o' b "e might ma.e the 

t^L tTe ta "7°^"^.^^-- carbamates. Hypothesising 

that the target protein was 20% similar to a reference protein 

35 XT \ r " '^'^ ^.ntagonist which was S0» 

35 similar to a benzodiazepine and 5s- = ^ 

reca3o,n=.-= • ^ ^ carbamate, we could 

recalculate weights as follows: 

benzodiazepines 

.5 X .8 = .4 



20 



wo 99/06839 



PCT/US98/15943 



69 



.5 X .2 X . 9 = .09 
.4 + . 09 = .49 



carbamates 

.5 X .8 = .4 

.5 X .2 X . 05 = . 005 

.4 + .005 = .405 



10 



.49 + .405 = .895 

.49/. 895 = .61 new benzodiazepine fraction 
.405/. 895 = .39 new carbamate fraction 

Now assume that the ta-rrro^ r^^^4- ■ 

^ target protein is 90% similar to the 

same reference protein ThPn ^ t . 

V c-Ln. inen the calculation becomes 

benzodiazepines 

X .2 = .1 
.5 X .8 X .9 = .36 
15 .1 + .36 = .46 



carbamates 

.5 X .2 = .1 

.5 X .8 X .05 = .004 

.1 + .004 = .104 



2° -46 + .104 = .564 

.46/. 564 = .82 new benzodiazepine fracti 
.104/. 564 = .18 new carbamate fracti 



on 

on 



adjust the „«ghts based on the .ost similar protein, then the 

:Z I T T" °" ^-"i- -d so T. 

For example, let us assume that the most similar protein had a 
imilar.ty o£ sot. and an antagonist „hiL was so 
benzodiazepine -type and 5% car-H=.m=*-^ ^ 

w^^„>,^o -.^ 5 1 carbamate -type. If so, the adjusted 

weights would, as stated above a-^ h^r.^-.^,- ■ 

■sn w «»A^uve, .32 benzodiazepine and is 

30 carbamate, if the ne5r^ Ucc*. « ^ . 

the next best protein were 20% similar, and its 

riTLr — — - =arhamate.li.e. 
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benzodiazepines 

.82 X .8 = .66 

.82 X .2 X .1 = .02 

.66 + .02 = .68 

5 carbamates 

.18 X .8 = .14 

.18 X .2 X .7 = .03 

.14 + .03 = .17 



•68 + .17 = .85 

.68/. 85 = .8 new benzodiazepine fraction 
17/. 85 = .2 new carbamate fraction 



If, for a given reference protein, more than one drug lead 

It will be appreciated that this is only one of many 
possible methods of adjustina libr^r-,, r,^ ■ ■ ^ 
proeei„/a.„3 lead sJullZ ' ce 

.0 of the candidate simple oonU^inatorial libraries. This is 
equivalent to giving it a weight of i and fh. ... 

p ^ ^"d the Others a weight 

Brug leads are not e,jual in value. The drugs will vary in 

S s^::::: ^''^ ^-^^-"""^y. -^iaenc! tl^e, ease f 

5 synthesis, oost of production etc. Those factors which the 

methol If o 1 -V rational 

method. If only potency is considered, then the most potent 

their oot "^"'"'^^ °" logarithms of 

I dle " - 10-, 

a drug with an IC50 of 10- might have a g, of 1/6. Of course 

Which higher potencies yield higher values might be used 

By means of a gu.lity factor, -negative teachings- may also 
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be taken into account . If t-hf» = u 

no^ ^ • ^- " the database includes compounds known 

not to inhibit a "retrieved" tc^-f^^^ ^"uwn 
could K d • , ''^^^''^"^^ P^°tein, then the library 

could be designed to ^edu^ representation of si.i J 

compounds The formula given previously will do that, since „ 
xs lower is Q is lower, and Q is lower if is ic^.er The 
extent to which q, influences w' :i . ^ 

s n^h«^ ^ T ' dependent on both S, and on 

s,. Other formulae could be used, e.g. = on 

^\ = (w,, * (l-sp) . (w,, * sp * sjo- where 
Q' = 1/Q and 0< q, < infinity; 
(the neutral point is the q^ = i) 
^ In a similar manner, possible substituents for tH. 
can oe evaluated for similarity/ ^r. ^ 

leads. imiiarity to the database-generated drug 

Examples of candidate simr.i« t,-v, 

cyclic compounds Containing One Hetero Atom 

Heteronitrogen 
pyrroles 

pentasubstituted pyrroles 
pyrrolidines 

pyrrol ines 

prolines 

indoles 

beta-carbolines 
pyridines 

dihydropyridines 
1 , 4 "dihydropyridines 
pyrido f 2 , 3 -d] pyrimidines 

tetrahydro.3H-imida2o[4,5-c] pyridines 
Isoquinolines 

tetrahydroisoquinolines 
quinolones 

beta-lactams 

azabicyclo[4.3.0]nonen-8-one amino acid 
Heterooxygen 

furans 

t e t r ahydrof urans 

2,5-disubstituted tetrahydrof urans 

pyrans 
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hydroxypyranones 

tetrahydroxypyranones 
gamma-butyrolactones 
Heterosulf ur 

sulfolenes 

cyclic Compounds with Two or More Hetero atoms 
Multiple heteronitrogens 
imidazoles 
pyrazoles 
piperazines 

diketopiperazines 
arylpiperazines 
benzylpiperazines 
benzodiazepines 

l,4-benzodiazepine-2,5-diones 
hydantoins 

5 - alkoxyhydar coins 
dihydropyrimidines 

l,3-disubstituted-5,6-.dihydopyrimidine- 
2 , 4 -diones 

cyclic ureas 
cyclic thioureas 
quinazolines 

chiral3-substituted-quinazoline-2,4-diones 
triazotes 

1/2, 3-triazoles 
purines 

Heteronitrogen and Heterooxygen 

dikelomorpholines 

isoxazoles 

isoxazolines 
Heteronitrogen and Heterosulfur 

thiazolidines 

N-axylthiazolidines 
dihydrothiazoles 

2-methylene.2 , 3 -dihydrothiazates 

2- aminothiazoles 
thiophenes 

3 - amino thiophenes 
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4-thiazolidinones 
4 -melathiazanones 

benzisothiazolones 

For details on synthesic: r^-F i^u 

libraries, see Nefzi f^i- ai 
Chem. Rev., 97:449-72 (1997) ^nH ^ ^wetzi, et al . , 

T ^ . . U997), and references cited therein 

In designing the library ^^erem. 
rir-.^rr T ^ ^ y ns iiiDrary, one may consider not only the 
drug leads retrieved fn-r i-v.^ u ■ . '-'"xy cne 

but also str.ctu.es Ir. L^fnr 'T"'"^''^^ 
or nucleic acid apta.ers to M„r:;' '"^ '^"''^^ 

the higher-ra„.ea reference TrlL^ '"^^ 

The present invention is useful, not onlv in d,».^.„- - 

target. It may be desirable to inhibit » k,« v, ■ , 
pathway. Several different orof»- ^•^^'"■'^ ^ bl°=hemacal 

that pathway. To decide wticb one : 7 '° "^'^"^ 

protein night be used to Z Z b " ' '"'^ 

™ost potent and specific dZ uad 
having the greatLt reseran.e .'o "I ""VT^ 
combinatorial library, becomes tH tl^rget orriL."'"""' 
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Examples 



Hypotl.aeie.1 Example 1 . Generation of DNA Apta-er 

The protain is mobilized, e.g. on filter paper, and 
subjected to binding by a librarv of , ■ ■ 'n" 

5 Th. ^n™ , ' °f random oligonucleotides. 

The Oligonucleotides may be DNA, RHA, or a DBA or RKA analogue 

^o!;; 'r """f "'^ functionality of the nucleotide U 

modified at the 2' position with substituents such as fluoro- 
amino-, or methoxy- . The meters of the library that bind t; 
the target protein are separated and amplified using the 
10 po ymerase chain reaction or other amplification scheme The 
-.,..111=. pool IS bound to the immobilized protein again, and 
the strong binding fraction is selected. This process is 
repeated If need be, e.g., s - is times, until ligands of the 

IS date TT"" '"'-'^y P-1 -y be 

reth^rlh r '"^ ^-^-^ affinity i 

reached, the oligonucle otide*; ar-a ^i ^ ■ 

cells . rr K : /^"^^ ^^xdes are cloned into suitable host 

Automated Vh . "' """^ -quences are determined by 

automated methods . 



Hypothetical Example 2 - Dofcia™^« 4. • 

o ^ P e ^ Determination of contact sites of 

2 0 aptamer on target 

The nucleic acid aptamers for- 

^ ^ specific taroet are 

resynthesized and 3 • - or s ■ -labeled with "P. Each radiolLle: 

100 SOO MM Pt,(pop).- and photolyzed at wavelengths between 

rdilLTi T":. "''-p'- ■ ^= ~ 

ine radiolabeled olxcromer i <: hv.a„ ^ • . 

oonh^^i . precipitated. A parallel 

control reaction is run withoM^ 

Photolyzed With and without-t^^otinT^ Z^^::! 

polyacrylamide sequencincj ael Tho r, -, • . 
n ^ nucleotides where there is 

0 significant reaction without the protein but not with tL 

protein are then classified as contact sites. This process is 

7:^^" r ^^""-^ ^ .arget'and si:: 

aptamers " """""^'^ ~^ 

Suppose a DNA library is scre*»n«r) fr. w 

screened for binding to lysozyme 
R. Diamond, J. Mol . Biol no.or,-, 



(cp. 



r^'"" " ^=^"1 <1"4)) and yields an 

aptamer with the sequence 5-- TASCrGGCCAAASiecGAACALcCCTTG. 
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,T kTT """-^'"^ ^= predicted by standard methods 
(described above) , which ahow that the bold bases are paired to 
give a seveh-base stem with an AA* bulge and a CGAA loop 
5 """" =>-"'==i--<i -nd radiolabeled on the 

5 5 end with "P. The labeled Oligomer is reacted With Pt,, pop).- 
by photolysis at 400 nm i„ phosphate buffered saline a^d 
analyzed on a sequencing gel, which shows that Pt,(pop) <• 
induces scission of the oligonucleotide at every base in the 
sequence, giving a ladder of bands for the oligomer with 
10 approximately the same extent of scission at each nucleotide in 
the sequence. 

The reaction is then repeated in the presence of enough 
protein to cause the majority of the oligonucleotide to be bound 
to the protein. The Pt. (pop).- exhibits some reaction with the 
th ? htration of the Pt,,pop)..- must be higher 

than m the reaction of the oligonucleotide alone, and the 
reaction must be performed in a short time so that alteration 
Of the structure of the nucleoprotein complex due to damag. of 

20 the'oT"" ^ ' tl-eless, only 

the Oligonucleotide is labeled, so only scission of the protein 
IS detected. The scission pattern visualized on a sequencing 
gel then shows the same relative reactivity at the nucleotides 
that do not contact the protein as in the reaction of the 

Oligomer alone, and greatly attenl,a^oH ^ ^- ■ 

^ cai-j-y attenuated reactivity at sites 

=5 protected by the protein. studies on crystallographically 

flithf^'l" Z'*""'""'" experiment 
faithfully indicates the sites of contact of the protein on 
(see Bremer already cited) . 

The .footprint" of the protein on the DNA is determined by 
quan itatmg the extent of cleavage at each nucleotide in boti 
reactions to form two histograms, normalizing the two histograms 
so tha nucleotides that are clearly outside the binding site 
give the same intensity, and then assigning contact sites as 

35 int" °' attenuates the relative 

■33 intensity. 

For the hypothetical lysozyme aptamer described above, 
cleavage without the protein wll! n(™ • , 

inh=„.-.. > ' approximately the same 

intensity at all nucleotides. The cleavage intensity in the 
reaction with the protein is then normalized for the nucleotides 
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on either end, and the si^^*Q t,k« 

. ^ " "^^^^ i^ss cleavage occurs ic; 

counted as a protein s^^^ r. occurs is 

Cleavage is obseLf at ^^'"^^^^^ 

5 the end nucleotides a^TII the i ,T'' """^^^ ™ 

sa.e With and without th: IZuT "'"'"^^^ 

and AAA bulge would be protected the T^'"" "™ 
would i^ply that the proteTn "^""^"^ 
fi.„i,- protein recognizes the AAA bulge and 

Hypothetical Bxa^le 3= Comparison of Two Apt«.ers 

The RNA aptamers shown in Scheme 11 , 
by SELEX for the reverse r "^"'^ '^"'"^^ 

immunodeficiency Virus , Chen „ mT'''"" 

15 =°i^.^..-«-th,T.w.Bio he:°str,;5rT;,^;°" 

20 that the underlined base! were del . 7 " 

protein, as described L Z "^"^ '^"^ 

oescribed in the previous example. within rh= 
contact regions, the bold bases are ^1= , 

pairing to make two hairpins wTth fl H ""^ 
point, the two functional! rseiencershor ":"'- 
25 according to the codes, with an ""^ 

contacting the protein that is base 1°'""^"""'' '° ^ * 
opposite strand, and Ao correspond^ t ^ ^ °" 

and so on. This Dart i=,.,. ^ ^ "ngle-stranded A, 

The hoX; soo- .sTt 
30 se^ences according to ^he aUustTaT ""'"'"^ ""^ 

in C. By this sch»™» '^""^trative) scoring system shown 

maximum score Of ; a lisLtT'^""""" ^"^'"^ '^^^ ^ 
and a h»= . "ismatch is given a maximum score of 1 8 

and a base pair is given a maximum score of l Th» l„ . 
the two sequences is then chosen as the llreL J 
=5 the maximum score is calculated, which in th ^ 
sequence is then aligned with Ih. V , "° ' ^"'^ 

allowing for gaps if n'oesslr^ U is Z ! " 

longer sequence as the parent se^ence t " """"'^ 

t> rent sequence because this imposes an 
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f ^•'■■i-cL.i. maccn, the score is the same* t-v,= 
maximum score For a r^=>-^ • -. 

core. For a partial match, the score is o 5 for- 

transposition of a base oair M o ^ • 
R n c ^ ^ ^ substitution of Ta for At) 

the / Pair/.is.atch pair „h«e the sa.e base ccntac s 

base the sa.e. The su™™ed ccparison score is then divided 
by the .ax.™u™ total score. The conparison then of the t^! 
apta^ers .n Sche.e 2 gives a score of or 0 423 ,^e 

10 score would have been 4/2. for an identity .atrix, 

DNA apta^ers would be scored in an identical mann.r 
^iiciL X would replace U Ue>i»i-^^ ^ 

in any Of the ...rstr nulrrs:^^^^^ 

IS "ot^drtrtrcrLran r --^^^ =• 

P p> , ^ ^^^^ "^^^^ four nucleotides (A T/U 

G, C) have the same maximum score or oarti.l . I 

. <-wic or partial match score<9 

Also, the individual scores coulH . scores, 

squares of all th. c 1 ^"'""'^'^ instead as the 

s^arL Of the '"^'^ '"'"'^^^ ^^'^ °f 

squares of the maximum score. 

20 scheme l - Aptamer Comparison 
A 



5 



^' -CAAACIJGGGUUarATTrTrTrrT^GUACAGCA - Dobbelstein 1 



995 



S'-GUACCGAAUGUGCUUUUCGGCCGMZUGGC^ . ^^^^ ^^^^ 

B 

25 Parent Sequence 

=C-Gc-Cg.cg, So-Ao-Uo-Uo-Oo-Oo-„o-Oc-Gc-Cg-Cg = lx/2 



6 



1 1 = 26 



c 

Perfect Matches 
Ao = 2 Uo = 2 
Au = 2 Ut = 1 



Go = 2 Co = 2 
Gc = 1 Cg = 1 
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Aa = 


1.8 


Uu = 


1.8 


Ga = 


1 


.8 


Ca = 


1.8 


Ag = 


1.8 


Ug = 


1.8 


Gu = 


1 


.8 


Cu = 


1.8 


Ac = 


1.8 


Uc = 


1.8 


Gg = 


1 


.8 


Cc = 


1.8 



Partial Matches 
5 Au/Ua = 0.5, 

Au/An =0.5 (n <> u) 
An/An' =0.9 (n <> n' ) 

in - V . too. Thus, Cg/Gc would be a partial 

10 match, with a value of o . 5 . partial 



Hypothetical Example 4 - naft>r-m-t^^*. • 

Pie* Oetemunation of Reactivity Descriptors 

Suppose Ru(tpy) (bpy)o^* has ^ r-=^^ ci^ipcors 
of 15 M- s- and t-h . constant with methionine 

ot M s and the rate constant with all ^-f 4-v, 

L an ofTn " ^"^^ is 1.0 X 10' M-^ s-^ and 

th. all Of the other amino acids give negligible rate 
constants . Now glutamine synthetase and i 

r^-ti-u ^ yiit-necase and lysozyme are reacted 

with the two compounds in the folded unfolded \. n • ""^^^ 
<?t-aho= r,. ""fo-'-'^ed, and BioKey-bound 

states. The BioKey peptides ar-^ • 

90 -irn^T • ^ Pepciaes are engineered to avoid the 

20 inclusion of methionine and tT-T^^^ v, 

with the Bio,c.w ■ tryptophan so that cross-reaction 

wicn the BioKey is not an issue t>,<=. 

normalized to the n.^er of the " 't ■ 

-agent, i.e.. the Rni.J~ZT "^"""^ ^^^^ 

methionine and the Pt.Cpf' " """'"^ °' 

25 tryptophan. »o„ the r H Lstan": """" ""'^^ °' 

states to give (hypothetic u" 



30 




. . , concentration is pe^ 

methionine in the two proteins Bp.,„ . , , 
concentratinn ■ Given as M- s- where the molar 

concentration is per total tryptophan. 
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35 



For the unfolded proteins th^ 
those .o. ..e a.i„o ac.L rsr^" ''7 " 

polymer. For gluta.i„e synthetase, there are\;itH " 

10 surface methionines rin. t uts^e Tthl TT 

™ost o. these methionines „o„ia .e hiLl: trhZ;:";.- 
=v ...» rate constant drops (hypotheticallv f„ -"K=,, 
value of the folded protein .or^t Z, r'"" '"^ 

9luta™ine synthetase relies on reaction =?T. ' 

.5 per subunit, both of which are at th ° tryptophans 

bind and therefore Protected ^ the tollV """^ 
t»o tryptophans is slightly exposed so t e^te""- " 
(hypcthetically, to about „ of it s inlt I'^V 
Slightly exposed tryptophan is not near the h ^ ' ""^ 

» Mnding the BioKey does not alter th^Tatrcora^ 

whichrerreT:; tt;u:in"7hrfr- - 

rate constant drops <hLth::icalTyT V:^;. 0?^^^^' 
value. The two residues are nor original 
BioKey binding has no effect ""^ =° 

=.n react with Pt.,pop,,. a:d t^ of ZTll^'^'T 
the binding site. Therefore, the folded ^ " 
(hypcthetically, one-third that of th! unfolded 
BloKey binding dramatically attenuate. Z " 
folded protein. attenuates the oxidation of the 

the :^::::::T^'::!::r^ -7---- ^ ™g 

molded rate constants normal" ed by rhetl lird^^t 

Thus, a descriptors can be calculated a . constant. 

<folded-unfolded,/unfolded^d de,clt = 
unfolded) /unfolded. This Knear fo^ui T''' ' ' 
residues that are not occluder " ' """"^ 

binding and will L:::T'LZ': 7^^.! """^ 
Therefore, an advantageous descripttr l^nHo 
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calculate the difference in square roots, i e ■ 
descriptor (folded) = [ (rate folded) « 
unfolded) «]/ (rate unfolded) « 



(rate 



descriptor (BioKey) = [ (rate 
5 unfolded) «]/( rate unfolded)« 

Descriptors calculated in thi 



are ; 



BioKey)« . (^.^te 
s manner from the data shown above 



10 



protein 



glutamine synthetase 



lysozyme 



state 



descriptor Ru 



0 .292 
0.711 

0 . 908 
0,908 



descriptor Pt 



0.764 
0.764 

0.422 
0.967 

synthetase has' 



folded 
BioKey 

folded 
BioKey 

TZT ''-w« coat gluta^ln'J =ync„ecase .a. 

a large amount of surface methionine and much of the surface 
methionine is near the active site fU.^ , . ="^tace 
h.„ , „ , ''■at glutamine synthetase 

has a some surface tryptophan not near the active site that 
IS lysozyme has very little surface methionine, and that ,ysoJ^ 
has surface tryptophan in the active site 

exhibit'; ^"^"^^r^^ case where the two reagents 

exhxbxt a reactivity profile that is 100. for a single Lino 
-Cid, i.e. 100% methionine for Ru (tpy) (bpy) Ld loo* 
20 tryptophan f or Pt. <pop, and 0. for all ^Ler'Hino a^ds. 

these cases, the concentration of protein used to calculate the 
r te constants is Just figured hy multiplying the concentrati^ 
Of the protein times the fraction of the reactive amino acid in 
the sequence. If a given reagent had a reactivity profile of 
25 40* isoleucxne. ,0* leucine, and 20% alanine, the protein 
concentration would he determined as the ^tal prlin 
concentration times „0.,0 times the fraction of the total™ 

frLtiot^wh-^r 'T^'"'' "-S the 

30 fracti" h H ^^"^ 'O-^" 'he 

fraction which are alanine residues) ) . 



Hypothetical Bxanple 5 - Dsing sloniXari 

Generation 



ties to larprove Lead 



Suppose the database exists with both 



aptamer descriptors 
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many other proteins. Than aDta„». . ly=°^y"e) and 
B are aeteo,ined for a new pro.eTn ? "^^^"^-"^ descriptors 

compared to the other proteC^n thlT^K" ' ^ " 

to be .0. Similar to lysoz^e .0- '"^"■'"^^ 

protein, 8* similar to olut.?, '° ribosomal 1,22 

aXl of the other protern: t th'eTt ^"^^ '° 

10 percentages may su. up to less th ^ """^ "'^^ 

so the taown inhibitors If the . ^""""'^ 

^„„™.. . " °' proteins with a >3i similarir,, 

— a„ taoulated and used to D^ert^,.^ , ^ . 

X. Suppose that lysozy^e is inhlh, , f°r protein 

a benzodiazepine,, ribo^om .^2 "^^ ' ^. 
IS natural product- and al„f inhibited by a 

benzodiazepine. ' ThTbe re^nTald " ""''"^^ ''^ ^ 

may be synthesized as librarle 1. ^.^"^^"^^"""^ 
ostresh, H... „ou,hten, Che^ r. ™ ^.r,, 
probably be the .ost efficient " 
30 quinazolinone and benzodiazepine libraries 
are part of larger libraries already and 

these compounds by the databa e ; Id H /riL"' 1^'^,"°" 

the parent scaffolds to bind prote n x In t 

natural product 3 is likely part of , contrast, the 

25 compounds that can be screened . ^'^^ '""'^'^ °' 

screened mdivirfna 1 1 -r . 

structural descriptors can be calculated / ^ 

within a certain similarity radius ^ k compounds 

according to known oompounVd ::i;r rthir' 

et al. 1996 or Matter 1997. methods, as in Cumins 

recitation o. a^nt^c" rLT Lrird"""^ 

ai.o Of all possii^ie sutrang^ ^7;^;^;/--^ ^ recitation 

invention also e«r»nrf. . ' " ^ "ass is recited, this 

s and to all ^1:^ 2:::::::::':':: '^'^^ 

class. ^nations of member or subclasses of that 
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CLAIMS 



1 A method ot identifying drugs which mediate the biological 
activity of a protein of interest which conprises: 

(al determining the reactivity of the protein of interest, in 
one or more states, with one or more reagents, thereby obtaining 
a reactivity descriptor for said protein, and similarly 
determining comparable reactivity descriptors for one or morl 
reference proteins, each reference protein having a biological 
activity .no„n to be mediated by one or more known refeLnc. 

larugs ; — 



(W comparing the reactivity descriptor for the protein of 
interest with the reactivity descriptors of said reference 
proteins, identifying lead proteins, which are refere-e 
15 ^oTh " TT descriptors are substantially similar 

15 o those Of the protein of interest, and identifying lead drugs 
Which are reference drugs which mediate the biclogiLl activL; 
Ot lead protexns, ^ 

'for 7ZT' ' '"""-^"l library which is enriched 

for lead drugs and analogues thereof, and 



20 ,d. screening said combinatorial compound library for drugs 
IntMsT """" - -""in - 

2. The method of claim l wherein the reactivity of the protein 

25 :!th th:i;dT °i ""^ in 

both the folded and vinfolded state. 

3. The method of claim l wherein the reactivity of the protein 

both °' -^^^"^"^ P--^-. i= ^"ermLd in 

both the free state and in a llgand-bound state. 

4 The method of claim 3 in „hi^h the ligand is an 
30 oligonucleotide. -^y^tna xs an 
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10 



5. The method of claim 3 in whir-Vi i-u^ t • ^ » 

^ m wnxch the Ixgand is a peptide or a 

peptoid. 

6. The method of claim 3 wh^y^^-ir, ^-k^ t • ^ . 

wnerein the lagand is identified by 

screening a combinatorial library. 

7^ The .ethod of claim 1 in which the protein of interest, and 
the reference proteins, are further characterized by the 
xdentxf xcation of one or more aptamers which bind said proteins 
and the similarity of the protein of interest to each of said 
reference proteins is determined at least in part on the basis 

of the similarity of the r-^Qnor^i-^,.« 

y or Ene respective aptamers which bind them. 

8- The method of claim 7 i. • 

^ ^iiich the aptamers are 

olionunucleotides . 

The method of claim 7 in which the .rotein of interest, and 

15 of T'i'""'"' "erized on the basis 

15 of the individual nucleotides of said aptamers which contact 

said proteins. 

10. The method of claim 7 in which the protein of interest, and 
the reference proteins, are further characterized on the basis 

20 i rK^Tf °' °' stmctur. of the aptamers 

20 which bind them. ^ i=> 

11. A method of identifying drugs which mediate the biological 
activity Of a protein of interest which comprises, 



25 



nrLe ■ , ' sequences Of aptamers which bind the 

protein of interest and the sequences of aptamers which bind one 
or more reference proteins, each reference protein having a 

. ""'''^ " ■"^^^""^ °- - -re 

IJlZlZ, Obtaining aptamer descriptors for said 

(b) comparing the aptamer descriptors for the protein of 
interest with the aptamer descriptor for each reference protein, 
identifying lead proteins, which are reference proteins whos; 
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apta^er descriptors are substantially sioil„ to those of the 
protein of interest, and identifying lead drugs, which «e 

"""" - 

5 (c) preparing a con4.i„atorial compound library which is 
enriched for lead drugs and analogues thereof, an7 

<d) screening said cona>inatorial compound library for drucs 

::::Lr ""'"^ -^-^^ - the^proteio" 

10 12. The method of claim ii u- ^ 

olgonucleotides. ""^ ^^^^^ 

"h °' " ^''^ P""in of interest 

L of^thr"^:. ^'^r""" characten^ed TtL 

IS :::::ct°sar;p::::r" ""^"-""^ -'"^ 

"h °' " ""''^'^ P-"i» °f interest 

and the reference proteins, are further characterized on tt 

basis Of the predicted or actual secondary structure of th! 

aptamers which bind them. ructure of the 
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